A mail.cnx.email CNAME (-> mx1.cnx.email) lets clients (Thunderbird etc.)
use a friendly hostname for submission/IMAP. To avoid a TLS name
mismatch the cert now carries mail.cnx.email as a SAN, so the acme_mx1
key is authorized to write _acme-challenge.mail too. The MX still points
at mx1.cnx.email and --reuse-key keeps the DANE TLSA digest valid across
the re-issue.
Pure formatting (nixfmt/prettier/yamlfmt); no behavior change. These
files predate the current treefmt config and were failing nix flake
check; reformatting them makes the gate green again.
mx1 runs Simple NixOS Mailserver (Postfix/Dovecot/Rspamd/OpenDKIM) for
cnx.email. The TLS cert is obtained via ACME DNS-01 using a dedicated,
scoped TSIG key (acme_mx1) that ns1 authorizes for only
_acme-challenge.mx1 and _acme-challenge.mta-sts on the cnx.email zone, so
the credential can write nothing else. Mailbox passwords are auto-minted
by a clan vars generator (four-word passphrase + number).
DANE TLSA (3 1 1) is published for _25._tcp.mx1; --reuse-key keeps the
key digest stable across renewals. MTA-STS is enforced via a Caddy vhost
serving the policy on :443 from the same cert (mta-sts SAN). Firewall
opens 25/587/465/143/993/443; 80 stays closed.
Explain that key material is auto-managed in the KASP keystore under
/var/lib/knot, and that the registrar DS is generated per zone with
`sudo -u knot keymgr <zone> ds`.
- Register mx1 in the inventory and as a direct-SSH `internet` host; give it
a static public IPv6 (2a01:4ff:2f0:1963::1).
- Point the cnx.email MX (plus SPF/DMARC) at mx1 and add its A record.
- Bring mx1 into monitoring: import exporters, add it to the mesh map and the
node scrape job so its host metrics and journald reach control.
- Add a clan-mx1 Hetzner firewall: inbound SMTP + ZeroTier + ICMP, no public
SSH (admin rides the mesh like the other hosts). 587/465/993 held for now.
- Extract per-host public IPv4/IPv6 into modules/hosts.nix, consumed by
clan.nix's internet hosts and each machine's cnx.staticIPv6, so each address
is declared once instead of being duplicated across configs.
- docs: add mx1 to the machines table.
VictoriaLogs, like the VM scraper, is IPv4-only by default: ":9428" binds
0.0.0.0 only, so ns1/ns2 pushing journald over the IPv6 mesh got "connection
refused" while control's own loopback (v4) upload worked. Add -enableTCP6 so it
binds [::] (dual-stack), matching the flag already used for the scraper.
Also simplify the systemd-journal-upload override to just startLimitIntervalSec=0
(retry forever / self-heal) and drop the SuccessExitStatus masking: a persistent
sink failure should stay loud rather than be hidden behind a green deploy.
The uploader exits when VictoriaLogs is unreachable. Upstream already sets
Restart=always/RestartSec=3sec, but the default start-rate limit lets the unit
give up permanently and trip switch-to-configuration when the sink is briefly
down. Disable the limit (startLimitIntervalSec=0) so logging stays best-effort
and never wedges a host or a deploy.
control runs VictoriaLogs (:9428, 30d, mesh-scoped) with a matching
Grafana datasource. Each host ships journald via systemd's own
journald.upload to the /insert/journald endpoint -- no extra agent.
control uploads over loopback so its logs survive a mesh outage; ns1
and ns2 push over the mesh.
DNSResolutionProbeFailed and DNSSECProbeFailed fire when an SOA or
DNSKEY probe to a public nameserver address stays down for 5m. The CNX
DNS dashboard gains a "DNS probes (outside-in)" row: per-zone/server
status table, probe success, and probe latency.
control runs blackbox_exporter on loopback, probing each nameserver's
public v4+v6 address for every zone: SOA (zone served) and DNSKEY (still
signed, since blackbox has no DO-bit option). Probe definitions are
shared between the exporter config and the VictoriaMetrics scrape jobs
so they can't drift. Verified live against ns1/ns2 over v4 and v6.
BackupJobFailed fires when a borgbackup job enters the systemd failed
state; BackupStale fires when the daily timer has not run in over 26h
(or has never run). Both read the node_exporter systemd collector on
the backup client, matching the CNX Backups dashboard.
Grafana dashboard (auto-provisioned from the dashboards dir) tracks
borgbackup job health, time since last run, and per-job systemd state
from the node_exporter systemd collector on the client. New docs page
covers the ns1 -> control topology, secrets flow, and restore commands.
clan borgbackup instance: control serves repos, ns1 backs up its
clan.core.state (the KASP keystore at /var/lib/knot) nightly over the
mesh with repokey encryption. ns1 maps the control machine name to its
ZeroTier address so the borg@control repo resolves.
Run `clan vars generate ns1` before deploy to mint the borg keypair.