Commit Graph

29 Commits

Author SHA1 Message Date
Berwn 1cb6f39ea2 Add declarative SNM mail stack on mx1 with DNS-01, DANE, MTA-STS
mx1 runs Simple NixOS Mailserver (Postfix/Dovecot/Rspamd/OpenDKIM) for
cnx.email. The TLS cert is obtained via ACME DNS-01 using a dedicated,
scoped TSIG key (acme_mx1) that ns1 authorizes for only
_acme-challenge.mx1 and _acme-challenge.mta-sts on the cnx.email zone, so
the credential can write nothing else. Mailbox passwords are auto-minted
by a clan vars generator (four-word passphrase + number).

DANE TLSA (3 1 1) is published for _25._tcp.mx1; --reuse-key keeps the
key digest stable across renewals. MTA-STS is enforced via a Caddy vhost
serving the policy on :443 from the same cert (mta-sts SAN). Firewall
opens 25/587/465/143/993/443; 80 stays closed.
2026-06-18 14:47:20 +07:00
Berwn 6e4178df04 Onboard mx1 mail host and factor out per-host public IPs
- Register mx1 in the inventory and as a direct-SSH `internet` host; give it
  a static public IPv6 (2a01:4ff:2f0:1963::1).
- Point the cnx.email MX (plus SPF/DMARC) at mx1 and add its A record.
- Bring mx1 into monitoring: import exporters, add it to the mesh map and the
  node scrape job so its host metrics and journald reach control.
- Add a clan-mx1 Hetzner firewall: inbound SMTP + ZeroTier + ICMP, no public
  SSH (admin rides the mesh like the other hosts). 587/465/993 held for now.
- Extract per-host public IPv4/IPv6 into modules/hosts.nix, consumed by
  clan.nix's internet hosts and each machine's cnx.staticIPv6, so each address
  is declared once instead of being duplicated across configs.
- docs: add mx1 to the machines table.
2026-06-18 11:53:14 +07:00
Berwn 8d9981ee5a Set disk schema of machine: mx1 to single-disk 2026-06-18 11:32:33 +07:00
Berwn afc2e997c0 machines/mx1/facter.json: update hardware configuration 2026-06-18 11:32:22 +07:00
Berwn faaa7b66c0 Add machine mx1 2026-06-18 11:21:27 +07:00
Berwn 54f607d063 Add blackbox exporter for outside-in DNS probes
control runs blackbox_exporter on loopback, probing each nameserver's
public v4+v6 address for every zone: SOA (zone served) and DNSKEY (still
signed, since blackbox has no DO-bit option). Probe definitions are
shared between the exporter config and the VictoriaMetrics scrape jobs
so they can't drift. Verified live against ns1/ns2 over v4 and v6.
2026-06-17 15:37:45 +07:00
Berwn 044891927b Back up Knot DNSSEC keystore from ns1 to control via borgbackup
clan borgbackup instance: control serves repos, ns1 backs up its
clan.core.state (the KASP keystore at /var/lib/knot) nightly over the
mesh with repokey encryption. ns1 maps the control machine name to its
ZeroTier address so the borg@control repo resolves.

Run `clan vars generate ns1` before deploy to mint the borg keypair.
2026-06-17 15:06:58 +07:00
Berwn 4c7c74836d Add vmalert alerting rules for DNS and host health
vmalert on control evaluates rules (declared in git) against VictoriaMetrics and
remote-writes alert state back, so firing alerts show as the ALERTS series in
Grafana. Covers SOA divergence between ns1/ns2, secondary zone expiry, scrape
target down, and root disk full. No notifier yet (notifier.blackhole). Also adds
TODO.md roadmap.
2026-06-17 14:49:32 +07:00
Berwn a7d4c0e567 Add mdBook infra runbook served by Caddy on control
Docs live in docs/ (DNS, ZeroTier mesh, monitoring), built at Nix-build time and
served as static files over the ZeroTier mesh on control:8080. Commit-to-edit:
change the markdown and redeploy to publish.
2026-06-17 14:26:21 +07:00
Berwn 33ac7e106b Add VictoriaMetrics + Grafana DNS monitoring over the mesh
control runs VictoriaMetrics (loopback) and Grafana; every machine exports
node metrics and the nameservers export Knot stats (mod-stats + knot-exporter).
Scraping and the Grafana UI ride the ZeroTier mesh only, scoped by nftables to
the mesh /88; the public side stays closed by the Hetzner cloud firewall. The
provisioned DNS dashboard includes a per-zone SOA serial table to catch
primary/secondary drift. ZeroTier ULAs are centralised in mesh-hosts.nix.
2026-06-17 10:17:27 +07:00
Berwn aa604bda9a Switch ns1 zone serial-policy to unixtime
dateserial (YYYYMMDDnn) only has a 2-digit same-day counter held in Knot's
journal; a journal reset restarted the counter and let ns1 mint a serial ns2
had already seen with older content, so ns2 never retransferred. unixtime is
strictly monotonic per reload, eliminating the shared-serial collision.
2026-06-16 18:59:45 +07:00
Berwn e795960dcf Configure static public IPv6 on control, ns1, ns2 2026-06-16 18:04:33 +07:00
Berwn de7d950596 Format tree with treefmt 2026-06-16 16:53:00 +07:00
kurogeek 3302b70485 clan.core.sops.defaultGroups to all machines 2026-06-16 16:46:55 +07:00
Berwn a3482face5 Allow ACME DNS-01 dynamic updates on ns1
Add a dedicated acme_ddns TSIG key (scoped to ns1 only) and an acl_acme rule
that limits it to TXT updates at or under _acme-challenge.<zone>. An external
ACME client can now write challenge records via RFC 2136; Knot signs them and
transfers to ns2, which never holds the key.
2026-06-14 17:12:17 +07:00
Berwn dc51cfbdb5 Enable DNSSEC and automatic SOA serials on the DNS zones
ns1 (primary) now signs every zone with an ECDSA P-256/SHA-256 policy and
manages the SOA serial itself: zonefile-load = difference-no-serial (with
journal-content = all) plus serial-policy = dateserial let records be edited
without bumping the serial by hand. ns2 needs no change; it transfers the
already-signed zone.

Also point the ns1/ns2 AAAA glue at the public Hetzner IPv6 addresses; they
previously pointed at unroutable ZeroTier mesh ULAs.
2026-06-14 16:27:30 +07:00
Berwn 5864054b00 Move Hetzner firewall rules into a separate data file
Extract the per-firewall rule data out of control's configuration into
modules/hetzner-firewall-rules.nix, imported like the DNS domains list.
The evaluated rules are unchanged.
2026-06-14 15:49:00 +07:00
Berwn 344f432640 Add Hetzner Cloud firewall auto-sync from clan config
control runs a oneshot on each deploy that creates each firewall if
missing and replaces its rules via the Hetzner API set_rules action,
using a Read/Write token stored as a clan secret. Public SSH is not
exposed; admin access rides the ZeroTier mesh, with emergency-access as
the console fallback.
2026-06-14 15:40:05 +07:00
Berwn 306a2cf61e Set per-machine timezones and enable NTP
control and ns2 use UTC+3 (Etc/GMT-3), ns1 uses UTC+1 (Etc/GMT-1) —
fixed offsets, no DST. Make systemd-timesyncd explicit on all three.
2026-06-14 15:02:34 +07:00
Berwn 807785cdab Add authoritative DNS on ns1/ns2 and finalize clan config
- Knot authoritative DNS: ns1 primary, ns2 secondary serving cnx.network,
  buildfor.life and cnx.email over TSIG-secured zone transfer (modules/dns)
- Knot listens publicly + over ZeroTier; firewall opens port 53
- Complete clan inventory: name/domain, admin SSH key, control as the
  zerotier controller, tor on all nixos machines
- Enable age yubikey/fido2-hmac secret plugins
2026-06-14 13:24:23 +07:00
Berwn a40c4d1800 Set disk schema of machine: ns2 to single-disk 2026-06-14 13:19:56 +07:00
Berwn 2a0bdc4a4b Set disk schema of machine: ns1 to single-disk 2026-06-14 13:19:44 +07:00
Berwn 840b3ca407 machines/ns2/facter.json: update hardware configuration 2026-06-14 13:18:41 +07:00
Berwn d757dc3c52 machines/ns1/facter.json: update hardware configuration 2026-06-14 13:16:11 +07:00
Berwn bf65146a62 Set disk schema of machine: control to single-disk 2026-06-14 12:29:39 +07:00
Berwn 8938637c28 machines/control/facter.json: update hardware configuration 2026-06-14 12:27:20 +07:00
Berwn 7d02499c0e Add machine ns2 2026-06-14 12:14:12 +07:00
Berwn bda1854376 Add machine ns1 2026-06-14 12:14:10 +07:00
Berwn a86525d37c Add machine control 2026-06-14 12:14:07 +07:00