web01 terminates TLS for grafana.cnx.network and proxies to Grafana on
control over the mesh. Caddy serves a *.cnx.network wildcard cert obtained
via ACME DNS-01, using a dedicated acme_web01 TSIG key scoped on ns1 to
_acme-challenge on the cnx.network zone only. Ports 80/443 are the only
public exposure (80 just redirects); admin and the backend ride ZeroTier.
Also reload Caddy on cert renewal for both web01 and mx1, since both
reference the cert via explicit tls file paths and would otherwise keep
serving a stale cert after a silent renewal.
A mail.cnx.email CNAME (-> mx1.cnx.email) lets clients (Thunderbird etc.)
use a friendly hostname for submission/IMAP. To avoid a TLS name
mismatch the cert now carries mail.cnx.email as a SAN, so the acme_mx1
key is authorized to write _acme-challenge.mail too. The MX still points
at mx1.cnx.email and --reuse-key keeps the DANE TLSA digest valid across
the re-issue.
Pure formatting (nixfmt/prettier/yamlfmt); no behavior change. These
files predate the current treefmt config and were failing nix flake
check; reformatting them makes the gate green again.
mx1 runs Simple NixOS Mailserver (Postfix/Dovecot/Rspamd/OpenDKIM) for
cnx.email. The TLS cert is obtained via ACME DNS-01 using a dedicated,
scoped TSIG key (acme_mx1) that ns1 authorizes for only
_acme-challenge.mx1 and _acme-challenge.mta-sts on the cnx.email zone, so
the credential can write nothing else. Mailbox passwords are auto-minted
by a clan vars generator (four-word passphrase + number).
DANE TLSA (3 1 1) is published for _25._tcp.mx1; --reuse-key keeps the
key digest stable across renewals. MTA-STS is enforced via a Caddy vhost
serving the policy on :443 from the same cert (mta-sts SAN). Firewall
opens 25/587/465/143/993/443; 80 stays closed.
Explain that key material is auto-managed in the KASP keystore under
/var/lib/knot, and that the registrar DS is generated per zone with
`sudo -u knot keymgr <zone> ds`.
- Register mx1 in the inventory and as a direct-SSH `internet` host; give it
a static public IPv6 (2a01:4ff:2f0:1963::1).
- Point the cnx.email MX (plus SPF/DMARC) at mx1 and add its A record.
- Bring mx1 into monitoring: import exporters, add it to the mesh map and the
node scrape job so its host metrics and journald reach control.
- Add a clan-mx1 Hetzner firewall: inbound SMTP + ZeroTier + ICMP, no public
SSH (admin rides the mesh like the other hosts). 587/465/993 held for now.
- Extract per-host public IPv4/IPv6 into modules/hosts.nix, consumed by
clan.nix's internet hosts and each machine's cnx.staticIPv6, so each address
is declared once instead of being duplicated across configs.
- docs: add mx1 to the machines table.
VictoriaLogs, like the VM scraper, is IPv4-only by default: ":9428" binds
0.0.0.0 only, so ns1/ns2 pushing journald over the IPv6 mesh got "connection
refused" while control's own loopback (v4) upload worked. Add -enableTCP6 so it
binds [::] (dual-stack), matching the flag already used for the scraper.
Also simplify the systemd-journal-upload override to just startLimitIntervalSec=0
(retry forever / self-heal) and drop the SuccessExitStatus masking: a persistent
sink failure should stay loud rather than be hidden behind a green deploy.
The uploader exits when VictoriaLogs is unreachable. Upstream already sets
Restart=always/RestartSec=3sec, but the default start-rate limit lets the unit
give up permanently and trip switch-to-configuration when the sink is briefly
down. Disable the limit (startLimitIntervalSec=0) so logging stays best-effort
and never wedges a host or a deploy.
control runs VictoriaLogs (:9428, 30d, mesh-scoped) with a matching
Grafana datasource. Each host ships journald via systemd's own
journald.upload to the /insert/journald endpoint -- no extra agent.
control uploads over loopback so its logs survive a mesh outage; ns1
and ns2 push over the mesh.
DNSResolutionProbeFailed and DNSSECProbeFailed fire when an SOA or
DNSKEY probe to a public nameserver address stays down for 5m. The CNX
DNS dashboard gains a "DNS probes (outside-in)" row: per-zone/server
status table, probe success, and probe latency.
control runs blackbox_exporter on loopback, probing each nameserver's
public v4+v6 address for every zone: SOA (zone served) and DNSKEY (still
signed, since blackbox has no DO-bit option). Probe definitions are
shared between the exporter config and the VictoriaMetrics scrape jobs
so they can't drift. Verified live against ns1/ns2 over v4 and v6.