Commit Graph

37 Commits

Author SHA1 Message Date
Berwn 3f3f4118c1 Use Singapore time (UTC+8) for mx1 and web01
Both hosts are in the Singapore region, not UTC+3.
2026-06-21 03:07:57 +07:00
Berwn dfdeb84ab8 Set time.timeZone on mx1 and web01
Both had NTP (timesyncd) enabled but no timezone, unlike control/ns1/ns2.
Default to Etc/GMT-3 to match the majority of hosts.
2026-06-21 03:07:31 +07:00
Berwn 48bf7fb250 Add web01 public reverse proxy with DNS-01 wildcard TLS
web01 terminates TLS for grafana.cnx.network and proxies to Grafana on
control over the mesh. Caddy serves a *.cnx.network wildcard cert obtained
via ACME DNS-01, using a dedicated acme_web01 TSIG key scoped on ns1 to
_acme-challenge on the cnx.network zone only. Ports 80/443 are the only
public exposure (80 just redirects); admin and the backend ride ZeroTier.

Also reload Caddy on cert renewal for both web01 and mx1, since both
reference the cert via explicit tls file paths and would otherwise keep
serving a stale cert after a silent renewal.
2026-06-21 03:05:54 +07:00
Berwn bd84bf7c85 Set disk schema of machine: web01 to single-disk 2026-06-21 02:25:24 +07:00
Berwn 848dc0dff7 machines/web01/facter.json: update hardware configuration 2026-06-21 02:23:00 +07:00
Berwn 95aff44f86 Add machine web01 2026-06-21 01:58:59 +07:00
Berwn 1dd3aadb97 Add mail.cnx.email client alias as a cert SAN
A mail.cnx.email CNAME (-> mx1.cnx.email) lets clients (Thunderbird etc.)
use a friendly hostname for submission/IMAP. To avoid a TLS name
mismatch the cert now carries mail.cnx.email as a SAN, so the acme_mx1
key is authorized to write _acme-challenge.mail too. The MX still points
at mx1.cnx.email and --reuse-key keeps the DANE TLSA digest valid across
the re-issue.
2026-06-18 15:01:03 +07:00
Berwn dc21348727 Format drifted files to satisfy the treefmt flake-check gate
Pure formatting (nixfmt/prettier/yamlfmt); no behavior change. These
files predate the current treefmt config and were failing nix flake
check; reformatting them makes the gate green again.
2026-06-18 14:49:48 +07:00
Berwn 1cb6f39ea2 Add declarative SNM mail stack on mx1 with DNS-01, DANE, MTA-STS
mx1 runs Simple NixOS Mailserver (Postfix/Dovecot/Rspamd/OpenDKIM) for
cnx.email. The TLS cert is obtained via ACME DNS-01 using a dedicated,
scoped TSIG key (acme_mx1) that ns1 authorizes for only
_acme-challenge.mx1 and _acme-challenge.mta-sts on the cnx.email zone, so
the credential can write nothing else. Mailbox passwords are auto-minted
by a clan vars generator (four-word passphrase + number).

DANE TLSA (3 1 1) is published for _25._tcp.mx1; --reuse-key keeps the
key digest stable across renewals. MTA-STS is enforced via a Caddy vhost
serving the policy on :443 from the same cert (mta-sts SAN). Firewall
opens 25/587/465/143/993/443; 80 stays closed.
2026-06-18 14:47:20 +07:00
Berwn 6e4178df04 Onboard mx1 mail host and factor out per-host public IPs
- Register mx1 in the inventory and as a direct-SSH `internet` host; give it
  a static public IPv6 (2a01:4ff:2f0:1963::1).
- Point the cnx.email MX (plus SPF/DMARC) at mx1 and add its A record.
- Bring mx1 into monitoring: import exporters, add it to the mesh map and the
  node scrape job so its host metrics and journald reach control.
- Add a clan-mx1 Hetzner firewall: inbound SMTP + ZeroTier + ICMP, no public
  SSH (admin rides the mesh like the other hosts). 587/465/993 held for now.
- Extract per-host public IPv4/IPv6 into modules/hosts.nix, consumed by
  clan.nix's internet hosts and each machine's cnx.staticIPv6, so each address
  is declared once instead of being duplicated across configs.
- docs: add mx1 to the machines table.
2026-06-18 11:53:14 +07:00
Berwn 8d9981ee5a Set disk schema of machine: mx1 to single-disk 2026-06-18 11:32:33 +07:00
Berwn afc2e997c0 machines/mx1/facter.json: update hardware configuration 2026-06-18 11:32:22 +07:00
Berwn faaa7b66c0 Add machine mx1 2026-06-18 11:21:27 +07:00
Berwn 54f607d063 Add blackbox exporter for outside-in DNS probes
control runs blackbox_exporter on loopback, probing each nameserver's
public v4+v6 address for every zone: SOA (zone served) and DNSKEY (still
signed, since blackbox has no DO-bit option). Probe definitions are
shared between the exporter config and the VictoriaMetrics scrape jobs
so they can't drift. Verified live against ns1/ns2 over v4 and v6.
2026-06-17 15:37:45 +07:00
Berwn 044891927b Back up Knot DNSSEC keystore from ns1 to control via borgbackup
clan borgbackup instance: control serves repos, ns1 backs up its
clan.core.state (the KASP keystore at /var/lib/knot) nightly over the
mesh with repokey encryption. ns1 maps the control machine name to its
ZeroTier address so the borg@control repo resolves.

Run `clan vars generate ns1` before deploy to mint the borg keypair.
2026-06-17 15:06:58 +07:00
Berwn 4c7c74836d Add vmalert alerting rules for DNS and host health
vmalert on control evaluates rules (declared in git) against VictoriaMetrics and
remote-writes alert state back, so firing alerts show as the ALERTS series in
Grafana. Covers SOA divergence between ns1/ns2, secondary zone expiry, scrape
target down, and root disk full. No notifier yet (notifier.blackhole). Also adds
TODO.md roadmap.
2026-06-17 14:49:32 +07:00
Berwn a7d4c0e567 Add mdBook infra runbook served by Caddy on control
Docs live in docs/ (DNS, ZeroTier mesh, monitoring), built at Nix-build time and
served as static files over the ZeroTier mesh on control:8080. Commit-to-edit:
change the markdown and redeploy to publish.
2026-06-17 14:26:21 +07:00
Berwn 33ac7e106b Add VictoriaMetrics + Grafana DNS monitoring over the mesh
control runs VictoriaMetrics (loopback) and Grafana; every machine exports
node metrics and the nameservers export Knot stats (mod-stats + knot-exporter).
Scraping and the Grafana UI ride the ZeroTier mesh only, scoped by nftables to
the mesh /88; the public side stays closed by the Hetzner cloud firewall. The
provisioned DNS dashboard includes a per-zone SOA serial table to catch
primary/secondary drift. ZeroTier ULAs are centralised in mesh-hosts.nix.
2026-06-17 10:17:27 +07:00
Berwn aa604bda9a Switch ns1 zone serial-policy to unixtime
dateserial (YYYYMMDDnn) only has a 2-digit same-day counter held in Knot's
journal; a journal reset restarted the counter and let ns1 mint a serial ns2
had already seen with older content, so ns2 never retransferred. unixtime is
strictly monotonic per reload, eliminating the shared-serial collision.
2026-06-16 18:59:45 +07:00
Berwn e795960dcf Configure static public IPv6 on control, ns1, ns2 2026-06-16 18:04:33 +07:00
Berwn de7d950596 Format tree with treefmt 2026-06-16 16:53:00 +07:00
kurogeek 3302b70485 clan.core.sops.defaultGroups to all machines 2026-06-16 16:46:55 +07:00
Berwn a3482face5 Allow ACME DNS-01 dynamic updates on ns1
Add a dedicated acme_ddns TSIG key (scoped to ns1 only) and an acl_acme rule
that limits it to TXT updates at or under _acme-challenge.<zone>. An external
ACME client can now write challenge records via RFC 2136; Knot signs them and
transfers to ns2, which never holds the key.
2026-06-14 17:12:17 +07:00
Berwn dc51cfbdb5 Enable DNSSEC and automatic SOA serials on the DNS zones
ns1 (primary) now signs every zone with an ECDSA P-256/SHA-256 policy and
manages the SOA serial itself: zonefile-load = difference-no-serial (with
journal-content = all) plus serial-policy = dateserial let records be edited
without bumping the serial by hand. ns2 needs no change; it transfers the
already-signed zone.

Also point the ns1/ns2 AAAA glue at the public Hetzner IPv6 addresses; they
previously pointed at unroutable ZeroTier mesh ULAs.
2026-06-14 16:27:30 +07:00
Berwn 5864054b00 Move Hetzner firewall rules into a separate data file
Extract the per-firewall rule data out of control's configuration into
modules/hetzner-firewall-rules.nix, imported like the DNS domains list.
The evaluated rules are unchanged.
2026-06-14 15:49:00 +07:00
Berwn 344f432640 Add Hetzner Cloud firewall auto-sync from clan config
control runs a oneshot on each deploy that creates each firewall if
missing and replaces its rules via the Hetzner API set_rules action,
using a Read/Write token stored as a clan secret. Public SSH is not
exposed; admin access rides the ZeroTier mesh, with emergency-access as
the console fallback.
2026-06-14 15:40:05 +07:00
Berwn 306a2cf61e Set per-machine timezones and enable NTP
control and ns2 use UTC+3 (Etc/GMT-3), ns1 uses UTC+1 (Etc/GMT-1) —
fixed offsets, no DST. Make systemd-timesyncd explicit on all three.
2026-06-14 15:02:34 +07:00
Berwn 807785cdab Add authoritative DNS on ns1/ns2 and finalize clan config
- Knot authoritative DNS: ns1 primary, ns2 secondary serving cnx.network,
  buildfor.life and cnx.email over TSIG-secured zone transfer (modules/dns)
- Knot listens publicly + over ZeroTier; firewall opens port 53
- Complete clan inventory: name/domain, admin SSH key, control as the
  zerotier controller, tor on all nixos machines
- Enable age yubikey/fido2-hmac secret plugins
2026-06-14 13:24:23 +07:00
Berwn a40c4d1800 Set disk schema of machine: ns2 to single-disk 2026-06-14 13:19:56 +07:00
Berwn 2a0bdc4a4b Set disk schema of machine: ns1 to single-disk 2026-06-14 13:19:44 +07:00
Berwn 840b3ca407 machines/ns2/facter.json: update hardware configuration 2026-06-14 13:18:41 +07:00
Berwn d757dc3c52 machines/ns1/facter.json: update hardware configuration 2026-06-14 13:16:11 +07:00
Berwn bf65146a62 Set disk schema of machine: control to single-disk 2026-06-14 12:29:39 +07:00
Berwn 8938637c28 machines/control/facter.json: update hardware configuration 2026-06-14 12:27:20 +07:00
Berwn 7d02499c0e Add machine ns2 2026-06-14 12:14:12 +07:00
Berwn bda1854376 Add machine ns1 2026-06-14 12:14:10 +07:00
Berwn a86525d37c Add machine control 2026-06-14 12:14:07 +07:00