cnx-network-clan

Author	SHA1	Message	Date
Berwn	415a050f6a	Scrape web01 node_exporter into VictoriaMetrics Add web01 to the mesh map and the node scrape job so it appears on the uptime/host dashboards alongside the other hosts.	2026-06-21 03:08:56 +07:00
Berwn	3f3f4118c1	Use Singapore time (UTC+8) for mx1 and web01 Both hosts are in the Singapore region, not UTC+3.	2026-06-21 03:07:57 +07:00
Berwn	dfdeb84ab8	Set time.timeZone on mx1 and web01 Both had NTP (timesyncd) enabled but no timezone, unlike control/ns1/ns2. Default to Etc/GMT-3 to match the majority of hosts.	2026-06-21 03:07:31 +07:00
Berwn	48bf7fb250	Add web01 public reverse proxy with DNS-01 wildcard TLS web01 terminates TLS for grafana.cnx.network and proxies to Grafana on control over the mesh. Caddy serves a *.cnx.network wildcard cert obtained via ACME DNS-01, using a dedicated acme_web01 TSIG key scoped on ns1 to _acme-challenge on the cnx.network zone only. Ports 80/443 are the only public exposure (80 just redirects); admin and the backend ride ZeroTier. Also reload Caddy on cert renewal for both web01 and mx1, since both reference the cert via explicit tls file paths and would otherwise keep serving a stale cert after a silent renewal.	2026-06-21 03:05:54 +07:00
Berwn	86a2928825	update(inventory.json): Installed web01	2026-06-21 02:28:43 +07:00
Berwn	f6da01ba18	Add web01 to secret vars/shared/dns-acme-web01-secret/secret	2026-06-21 02:26:44 +07:00
Berwn	eeed40bcb5	Update vars via generator dns-acme-web01-rfc2136 for machine web01	2026-06-21 02:26:44 +07:00
Berwn	aac8f9d8e6	Update vars via generator dns-acme-web01-knot for machine ns1	2026-06-21 02:26:43 +07:00
Berwn	f5874bc337	Update vars via generator zerotier for machine web01	2026-06-21 02:26:33 +07:00
Berwn	2481d4bf92	Update vars via generator tor_tor for machine web01	2026-06-21 02:26:32 +07:00
Berwn	2d8096ee57	Update vars via generator state-version for machine web01	2026-06-21 02:26:30 +07:00
Berwn	1a4a749d78	Update vars via generator root-password for machine web01	2026-06-21 02:26:30 +07:00
Berwn	1c779d8013	Update vars via generator openssh for machine web01	2026-06-21 02:26:30 +07:00
Berwn	9c4e036b09	Update vars via generator emergency-access for machine web01	2026-06-21 02:26:30 +07:00
Berwn	8139b91fbc	Add machine web01 to secrets	2026-06-21 02:26:30 +07:00
Berwn	c436389619	Update secret web01-age.key	2026-06-21 02:26:29 +07:00
Berwn	9fc97e65b2	Update vars via generator dns-acme-web01-secret for machine ns1	2026-06-21 02:26:29 +07:00
Berwn	bd84bf7c85	Set disk schema of machine: web01 to single-disk	2026-06-21 02:25:24 +07:00
Berwn	848dc0dff7	machines/web01/facter.json: update hardware configuration	2026-06-21 02:23:00 +07:00
Berwn	95aff44f86	Add machine web01	2026-06-21 01:58:59 +07:00
Berwn	f42569e992	Add provisioned Grafana uptime dashboard for all hosts	2026-06-21 01:57:08 +07:00
Berwn	1dd3aadb97	Add mail.cnx.email client alias as a cert SAN A mail.cnx.email CNAME (-> mx1.cnx.email) lets clients (Thunderbird etc.) use a friendly hostname for submission/IMAP. To avoid a TLS name mismatch the cert now carries mail.cnx.email as a SAN, so the acme_mx1 key is authorized to write _acme-challenge.mail too. The MX still points at mx1.cnx.email and --reuse-key keeps the DANE TLSA digest valid across the re-issue.	2026-06-18 15:01:03 +07:00
Berwn	dc21348727	Format drifted files to satisfy the treefmt flake-check gate Pure formatting (nixfmt/prettier/yamlfmt); no behavior change. These files predate the current treefmt config and were failing nix flake check; reformatting them makes the gate green again.	2026-06-18 14:49:48 +07:00
Berwn	1cb6f39ea2	Add declarative SNM mail stack on mx1 with DNS-01, DANE, MTA-STS mx1 runs Simple NixOS Mailserver (Postfix/Dovecot/Rspamd/OpenDKIM) for cnx.email. The TLS cert is obtained via ACME DNS-01 using a dedicated, scoped TSIG key (acme_mx1) that ns1 authorizes for only _acme-challenge.mx1 and _acme-challenge.mta-sts on the cnx.email zone, so the credential can write nothing else. Mailbox passwords are auto-minted by a clan vars generator (four-word passphrase + number). DANE TLSA (3 1 1) is published for _25._tcp.mx1; --reuse-key keeps the key digest stable across renewals. MTA-STS is enforced via a Caddy vhost serving the policy on :443 from the same cert (mta-sts SAN). Firewall opens 25/587/465/143/993/443; 80 stays closed.	2026-06-18 14:47:20 +07:00
Berwn	026a26dd53	Add ns1 to secret vars/shared/dns-acme-mx1-secret/secret	2026-06-18 14:11:40 +07:00
Berwn	7e5d50b260	Update vars via generator dns-acme-mx1-knot for machine ns1	2026-06-18 14:11:40 +07:00
Berwn	312de984c1	Update vars via generator dns-acme-rfc2136 for machine mx1	2026-06-18 14:11:40 +07:00
Berwn	d76aa8cc8d	Update vars via generator mail-passwd-postmaster-at-cnx-email for machine mx1	2026-06-18 14:11:36 +07:00
Berwn	0a78cad06e	Update vars via generator dns-acme-mx1-secret for machine mx1	2026-06-18 14:11:36 +07:00
Berwn	d1b24017aa	Use no-store for docs: epoch mtimes make revalidation serve stale	2026-06-18 12:24:38 +07:00
Berwn	77a18df257	Stop browsers serving stale docs by forcing revalidation	2026-06-18 12:19:42 +07:00
Berwn	a4fe2a7b3a	Document how to pull registrar DS records from Knot on ns1 Explain that key material is auto-managed in the KASP keystore under /var/lib/knot, and that the registrar DS is generated per zone with `sudo -u knot keymgr <zone> ds`.	2026-06-18 12:12:10 +07:00
Berwn	6e4178df04	Onboard mx1 mail host and factor out per-host public IPs - Register mx1 in the inventory and as a direct-SSH `internet` host; give it a static public IPv6 (2a01:4ff:2f0:1963::1). - Point the cnx.email MX (plus SPF/DMARC) at mx1 and add its A record. - Bring mx1 into monitoring: import exporters, add it to the mesh map and the node scrape job so its host metrics and journald reach control. - Add a clan-mx1 Hetzner firewall: inbound SMTP + ZeroTier + ICMP, no public SSH (admin rides the mesh like the other hosts). 587/465/993 held for now. - Extract per-host public IPv4/IPv6 into modules/hosts.nix, consumed by clan.nix's internet hosts and each machine's cnx.staticIPv6, so each address is declared once instead of being duplicated across configs. - docs: add mx1 to the machines table.	2026-06-18 11:53:14 +07:00
Berwn	2c89ab913c	update(inventory.json): Installed mx1	2026-06-18 11:35:22 +07:00
Berwn	84c3eece58	Update vars via generator zerotier for machine mx1	2026-06-18 11:33:06 +07:00
Berwn	7f5227d2e2	Update vars via generator tor_tor for machine mx1	2026-06-18 11:33:06 +07:00
Berwn	ebf4efe5c9	Update vars via generator state-version for machine mx1	2026-06-18 11:33:04 +07:00
Berwn	64b7eb1934	Update vars via generator root-password for machine mx1	2026-06-18 11:33:04 +07:00
Berwn	e763d76ae9	Update vars via generator openssh for machine mx1	2026-06-18 11:33:03 +07:00
Berwn	b65f526ea2	Update vars via generator emergency-access for machine mx1	2026-06-18 11:33:03 +07:00
Berwn	3a0bc2dba4	Add machine mx1 to secrets	2026-06-18 11:33:03 +07:00
Berwn	6098fe9a3b	Update secret mx1-age.key	2026-06-18 11:33:03 +07:00
Berwn	8d9981ee5a	Set disk schema of machine: mx1 to single-disk	2026-06-18 11:32:33 +07:00
Berwn	afc2e997c0	machines/mx1/facter.json: update hardware configuration	2026-06-18 11:32:22 +07:00
Berwn	faaa7b66c0	Add machine mx1	2026-06-18 11:21:27 +07:00
Berwn	9c8a2abf3f	Bind VictoriaLogs on IPv6 so the mesh can ship journald to it VictoriaLogs, like the VM scraper, is IPv4-only by default: ":9428" binds 0.0.0.0 only, so ns1/ns2 pushing journald over the IPv6 mesh got "connection refused" while control's own loopback (v4) upload worked. Add -enableTCP6 so it binds [::] (dual-stack), matching the flag already used for the scraper. Also simplify the systemd-journal-upload override to just startLimitIntervalSec=0 (retry forever / self-heal) and drop the SuccessExitStatus masking: a persistent sink failure should stay loud rather than be hidden behind a green deploy.	2026-06-17 17:27:56 +07:00
Berwn	0eb883061b	Keep systemd-journal-upload retrying instead of failing a deploy The uploader exits when VictoriaLogs is unreachable. Upstream already sets Restart=always/RestartSec=3sec, but the default start-rate limit lets the unit give up permanently and trip switch-to-configuration when the sink is briefly down. Disable the limit (startLimitIntervalSec=0) so logging stays best-effort and never wedges a host or a deploy.	2026-06-17 17:09:30 +07:00
Berwn	d4a171640b	Add VictoriaLogs for centralized journald across all hosts control runs VictoriaLogs (:9428, 30d, mesh-scoped) with a matching Grafana datasource. Each host ships journald via systemd's own journald.upload to the /insert/journald endpoint -- no extra agent. control uploads over loopback so its logs survive a mesh outage; ns1 and ns2 push over the mesh.	2026-06-17 16:53:52 +07:00
Berwn	c7b0f206c8	Alert on and chart blackbox DNS probe failures DNSResolutionProbeFailed and DNSSECProbeFailed fire when an SOA or DNSKEY probe to a public nameserver address stays down for 5m. The CNX DNS dashboard gains a "DNS probes (outside-in)" row: per-zone/server status table, probe success, and probe latency.	2026-06-17 15:42:13 +07:00
Berwn	54f607d063	Add blackbox exporter for outside-in DNS probes control runs blackbox_exporter on loopback, probing each nameserver's public v4+v6 address for every zone: SOA (zone served) and DNSKEY (still signed, since blackbox has no DO-bit option). Probe definitions are shared between the exporter config and the VictoriaMetrics scrape jobs so they can't drift. Verified live against ns1/ns2 over v4 and v6.	2026-06-17 15:37:45 +07:00

1 2 3

127 Commits