Deliver cnx.email DMARC aggregate/forensic reports to a dedicated dmarc@cnx.email
mailbox on mx1 and analyze them with parsedmarc on control, storing parsed
reports in a local loopback Elasticsearch and visualizing via the auto-provisioned
Grafana dashboard. parsedmarc fetches the mailbox over IMAPS across the mesh
(mx1.cnx.email pinned to its mesh address so TLS still validates), using a shared
mail-dmarc-cred clan var so mx1's mailserver and control see the same password.
Reflect web01 in the machines table and monitoring scrape list, note Grafana is
now also published publicly via web01's reverse proxy, add the CNX Uptime
dashboard, and document the dedicated acme_mx1/acme_web01 DNS-01 keys.
Pure formatting (nixfmt/prettier/yamlfmt); no behavior change. These
files predate the current treefmt config and were failing nix flake
check; reformatting them makes the gate green again.
Explain that key material is auto-managed in the KASP keystore under
/var/lib/knot, and that the registrar DS is generated per zone with
`sudo -u knot keymgr <zone> ds`.
- Register mx1 in the inventory and as a direct-SSH `internet` host; give it
a static public IPv6 (2a01:4ff:2f0:1963::1).
- Point the cnx.email MX (plus SPF/DMARC) at mx1 and add its A record.
- Bring mx1 into monitoring: import exporters, add it to the mesh map and the
node scrape job so its host metrics and journald reach control.
- Add a clan-mx1 Hetzner firewall: inbound SMTP + ZeroTier + ICMP, no public
SSH (admin rides the mesh like the other hosts). 587/465/993 held for now.
- Extract per-host public IPv4/IPv6 into modules/hosts.nix, consumed by
clan.nix's internet hosts and each machine's cnx.staticIPv6, so each address
is declared once instead of being duplicated across configs.
- docs: add mx1 to the machines table.
VictoriaLogs, like the VM scraper, is IPv4-only by default: ":9428" binds
0.0.0.0 only, so ns1/ns2 pushing journald over the IPv6 mesh got "connection
refused" while control's own loopback (v4) upload worked. Add -enableTCP6 so it
binds [::] (dual-stack), matching the flag already used for the scraper.
Also simplify the systemd-journal-upload override to just startLimitIntervalSec=0
(retry forever / self-heal) and drop the SuccessExitStatus masking: a persistent
sink failure should stay loud rather than be hidden behind a green deploy.
control runs VictoriaLogs (:9428, 30d, mesh-scoped) with a matching
Grafana datasource. Each host ships journald via systemd's own
journald.upload to the /insert/journald endpoint -- no extra agent.
control uploads over loopback so its logs survive a mesh outage; ns1
and ns2 push over the mesh.
control runs blackbox_exporter on loopback, probing each nameserver's
public v4+v6 address for every zone: SOA (zone served) and DNSKEY (still
signed, since blackbox has no DO-bit option). Probe definitions are
shared between the exporter config and the VictoriaMetrics scrape jobs
so they can't drift. Verified live against ns1/ns2 over v4 and v6.
Grafana dashboard (auto-provisioned from the dashboards dir) tracks
borgbackup job health, time since last run, and per-job systemd state
from the node_exporter systemd collector on the client. New docs page
covers the ns1 -> control topology, secrets flow, and restore commands.
Docs live in docs/ (DNS, ZeroTier mesh, monitoring), built at Nix-build time and
served as static files over the ZeroTier mesh on control:8080. Commit-to-edit:
change the markdown and redeploy to publish.