Commit Graph

122 Commits

Author SHA1 Message Date
Berwn 4c7c74836d Add vmalert alerting rules for DNS and host health
vmalert on control evaluates rules (declared in git) against VictoriaMetrics and
remote-writes alert state back, so firing alerts show as the ALERTS series in
Grafana. Covers SOA divergence between ns1/ns2, secondary zone expiry, scrape
target down, and root disk full. No notifier yet (notifier.blackhole). Also adds
TODO.md roadmap.
2026-06-17 14:49:32 +07:00
Berwn a7d4c0e567 Add mdBook infra runbook served by Caddy on control
Docs live in docs/ (DNS, ZeroTier mesh, monitoring), built at Nix-build time and
served as static files over the ZeroTier mesh on control:8080. Commit-to-edit:
change the markdown and redeploy to publish.
2026-06-17 14:26:21 +07:00
Berwn 3a8fe660a5 Swap ZeroTier external members: drop Alex/Alex-gateway, add alex-nixos 2026-06-17 12:15:26 +07:00
Berwn 9aa83d70a2 Admit external ZeroTier members to the mesh by node id
clan.nix gains an allowedIps list for the zerotier controller, fed via a
ztMemberIp helper that derives each member's IPv6 on this network from its
10-char node id + the zerotier-network-id var. Lets us list external devices
(admin laptops) by their stable node id, which this clan-core's allowedIps
interface consumes as --member-ip on control.
2026-06-17 12:13:47 +07:00
Berwn 848c4ec47d Read mesh host map from clan zerotier vars instead of hardcoding
The control/ns1/ns2 mesh IPs and the /88 subnet were duplicated literals in
mesh-hosts.nix. clan-core's zerotier generator already writes each machine's IP
as a public var (vars/per-machine/<m>/zerotier/zerotier-ip), so read from there
and derive the subnet from zerotier-network-id. Pure refactor: the rendered
values are identical and the system derivation hash is unchanged.
2026-06-17 11:53:56 +07:00
Berwn 8ac96b2d10 Enable IPv6 dialing for VictoriaMetrics scrapes
The scraper defaults to IPv4-only, so the ns1/ns2 mesh ULA targets were
dropped with 'no suitable address found'. -enableTCP6 lets VM scrape them.
2026-06-17 10:51:31 +07:00
Berwn 1405605eac Remove key(s) for user berwn from secrets 2026-06-17 10:29:23 +07:00
Berwn ad0c47e046 Add key(s) for user berwn to secrets 2026-06-17 10:26:55 +07:00
Berwn fb7b269f68 Update vars via generator grafana-admin for machine control 2026-06-17 10:17:45 +07:00
Berwn 33ac7e106b Add VictoriaMetrics + Grafana DNS monitoring over the mesh
control runs VictoriaMetrics (loopback) and Grafana; every machine exports
node metrics and the nameservers export Knot stats (mod-stats + knot-exporter).
Scraping and the Grafana UI ride the ZeroTier mesh only, scoped by nftables to
the mesh /88; the public side stays closed by the Hetzner cloud firewall. The
provisioned DNS dashboard includes a per-zone SOA serial table to catch
primary/secondary drift. ZeroTier ULAs are centralised in mesh-hosts.nix.
2026-06-17 10:17:27 +07:00
Berwn 63446173bc monitor.cnx.network DNS test 2026-06-16 19:03:49 +07:00
Berwn aa604bda9a Switch ns1 zone serial-policy to unixtime
dateserial (YYYYMMDDnn) only has a 2-digit same-day counter held in Knot's
journal; a journal reset restarted the counter and let ns1 mint a serial ns2
had already seen with older content, so ns2 never retransferred. unixtime is
strictly monotonic per reload, eliminating the shared-serial collision.
2026-06-16 18:59:45 +07:00
Berwn e795960dcf Configure static public IPv6 on control, ns1, ns2 2026-06-16 18:04:33 +07:00
Berwn 6783ad7c17 Add internet networking service for direct SSH to public IPs 2026-06-16 18:04:29 +07:00
Berwn a49aea3c7a vars fix 2026-06-16 16:59:54 +07:00
Berwn de7d950596 Format tree with treefmt 2026-06-16 16:53:00 +07:00
Berwn cf0d796bee Add treefmt formatter (nix fmt + flake check gate) 2026-06-16 16:53:00 +07:00
kurogeek 3302b70485 clan.core.sops.defaultGroups to all machines 2026-06-16 16:46:55 +07:00
kurogeek c85da6b8fc Add user berwn to group admins 2026-06-16 16:44:32 +07:00
kurogeek d50603743e Add user kurogeek to group admins 2026-06-16 16:44:25 +07:00
Berwn 95b9375324 Grant kurogeek admin SSH access on all machines 2026-06-16 16:30:18 +07:00
Berwn 70cbfe84b1 Add user kurogeek to secrets 2026-06-16 16:24:23 +07:00
Berwn a3482face5 Allow ACME DNS-01 dynamic updates on ns1
Add a dedicated acme_ddns TSIG key (scoped to ns1 only) and an acl_acme rule
that limits it to TXT updates at or under _acme-challenge.<zone>. An external
ACME client can now write challenge records via RFC 2136; Knot signs them and
transfers to ns2, which never holds the key.
2026-06-14 17:12:17 +07:00
Berwn 8330eaa8ce Update vars via generator dns-acme-tsig for machine ns1 2026-06-14 17:07:17 +07:00
Berwn dc51cfbdb5 Enable DNSSEC and automatic SOA serials on the DNS zones
ns1 (primary) now signs every zone with an ECDSA P-256/SHA-256 policy and
manages the SOA serial itself: zonefile-load = difference-no-serial (with
journal-content = all) plus serial-policy = dateserial let records be edited
without bumping the serial by hand. ns2 needs no change; it transfers the
already-signed zone.

Also point the ns1/ns2 AAAA glue at the public Hetzner IPv6 addresses; they
previously pointed at unroutable ZeroTier mesh ULAs.
2026-06-14 16:27:30 +07:00
Berwn 5864054b00 Move Hetzner firewall rules into a separate data file
Extract the per-firewall rule data out of control's configuration into
modules/hetzner-firewall-rules.nix, imported like the DNS domains list.
The evaluated rules are unchanged.
2026-06-14 15:49:00 +07:00
Berwn 344f432640 Add Hetzner Cloud firewall auto-sync from clan config
control runs a oneshot on each deploy that creates each firewall if
missing and replaces its rules via the Hetzner API set_rules action,
using a Read/Write token stored as a clan secret. Public SSH is not
exposed; admin access rides the ZeroTier mesh, with emergency-access as
the console fallback.
2026-06-14 15:40:05 +07:00
Berwn dbb67dbd9c Update vars via generator hetzner-firewall for machine control 2026-06-14 15:37:25 +07:00
Berwn 2506b21ffa Enable emergency-access recovery service
Add the clan-core emergency-access service on all nixos machines; it
sets a per-machine recovery root password for console login when a
machine fails to boot.
2026-06-14 15:02:34 +07:00
Berwn 306a2cf61e Set per-machine timezones and enable NTP
control and ns2 use UTC+3 (Etc/GMT-3), ns1 uses UTC+1 (Etc/GMT-1) —
fixed offsets, no DST. Make systemd-timesyncd explicit on all three.
2026-06-14 15:02:34 +07:00
Berwn 91578a2b43 Update vars via generator emergency-access for machine ns2 2026-06-14 15:00:25 +07:00
Berwn ab8288aef9 Update vars via generator emergency-access for machine ns1 2026-06-14 15:00:24 +07:00
Berwn 7b292b8279 Update vars via generator emergency-access for machine control 2026-06-14 15:00:24 +07:00
Berwn 56f0af3153 Fix knot startup on ns1/ns2: TSIG key perms and port 53 conflict
knotd runs as the "knot" user, so the shared TSIG key file needs
owner/group knot — it was root-only and knot couldn't read it.

systemd-resolved's stub listener was holding port 53, so knot's
0.0.0.0@53 / ::@53 TCP bind failed. Disable the stub (resolution
still works via nss-resolve) to free the port.
2026-06-14 14:49:10 +07:00
Berwn 9de95b4fb5 update(inventory.json): Installed ns2 2026-06-14 13:34:17 +07:00
Berwn 099383ccfa update(inventory.json): Installed ns1 2026-06-14 13:29:53 +07:00
Berwn 807785cdab Add authoritative DNS on ns1/ns2 and finalize clan config
- Knot authoritative DNS: ns1 primary, ns2 secondary serving cnx.network,
  buildfor.life and cnx.email over TSIG-secured zone transfer (modules/dns)
- Knot listens publicly + over ZeroTier; firewall opens port 53
- Complete clan inventory: name/domain, admin SSH key, control as the
  zerotier controller, tor on all nixos machines
- Enable age yubikey/fido2-hmac secret plugins
2026-06-14 13:24:23 +07:00
Berwn 9f1a2861ce Add ns2 to secret vars/shared/dns-tsig/tsig.conf 2026-06-14 13:22:43 +07:00
Berwn 2798e8e8f0 Update vars via generator dns-tsig for machine ns1 2026-06-14 13:22:39 +07:00
Berwn a40c4d1800 Set disk schema of machine: ns2 to single-disk 2026-06-14 13:19:56 +07:00
Berwn 2a0bdc4a4b Set disk schema of machine: ns1 to single-disk 2026-06-14 13:19:44 +07:00
Berwn 840b3ca407 machines/ns2/facter.json: update hardware configuration 2026-06-14 13:18:41 +07:00
Berwn d757dc3c52 machines/ns1/facter.json: update hardware configuration 2026-06-14 13:16:11 +07:00
Berwn 80a9761878 Update vars via generator zerotier for machine ns2 2026-06-14 12:36:52 +07:00
Berwn 6aa68a0e4d Update vars via generator zerotier for machine ns1 2026-06-14 12:36:51 +07:00
Berwn 67e60910be update(inventory.json): Installed control 2026-06-14 12:36:24 +07:00
Berwn bf65146a62 Set disk schema of machine: control to single-disk 2026-06-14 12:29:39 +07:00
Berwn 8938637c28 machines/control/facter.json: update hardware configuration 2026-06-14 12:27:20 +07:00
Berwn ede5478952 Update vars via generator tor_tor for machine ns2 2026-06-14 12:20:30 +07:00
Berwn e142ea93c4 Update vars via generator state-version for machine ns2 2026-06-14 12:20:28 +07:00