Commit Graph

74 Commits

Author SHA1 Message Date
Berwn 044891927b Back up Knot DNSSEC keystore from ns1 to control via borgbackup
clan borgbackup instance: control serves repos, ns1 backs up its
clan.core.state (the KASP keystore at /var/lib/knot) nightly over the
mesh with repokey encryption. ns1 maps the control machine name to its
ZeroTier address so the borg@control repo resolves.

Run `clan vars generate ns1` before deploy to mint the borg keypair.
2026-06-17 15:06:58 +07:00
Berwn 7ae3221b83 Add Active alerts panel to the top of the CNX DNS dashboard
Surfaces vmalert's firing ALERTS series as a table at the top of the dashboard,
so the minimal-delivery alerts are visible at a glance. Existing panels shift
down by one row.
2026-06-17 14:51:33 +07:00
Berwn 4c7c74836d Add vmalert alerting rules for DNS and host health
vmalert on control evaluates rules (declared in git) against VictoriaMetrics and
remote-writes alert state back, so firing alerts show as the ALERTS series in
Grafana. Covers SOA divergence between ns1/ns2, secondary zone expiry, scrape
target down, and root disk full. No notifier yet (notifier.blackhole). Also adds
TODO.md roadmap.
2026-06-17 14:49:32 +07:00
Berwn a7d4c0e567 Add mdBook infra runbook served by Caddy on control
Docs live in docs/ (DNS, ZeroTier mesh, monitoring), built at Nix-build time and
served as static files over the ZeroTier mesh on control:8080. Commit-to-edit:
change the markdown and redeploy to publish.
2026-06-17 14:26:21 +07:00
Berwn 3a8fe660a5 Swap ZeroTier external members: drop Alex/Alex-gateway, add alex-nixos 2026-06-17 12:15:26 +07:00
Berwn 9aa83d70a2 Admit external ZeroTier members to the mesh by node id
clan.nix gains an allowedIps list for the zerotier controller, fed via a
ztMemberIp helper that derives each member's IPv6 on this network from its
10-char node id + the zerotier-network-id var. Lets us list external devices
(admin laptops) by their stable node id, which this clan-core's allowedIps
interface consumes as --member-ip on control.
2026-06-17 12:13:47 +07:00
Berwn 848c4ec47d Read mesh host map from clan zerotier vars instead of hardcoding
The control/ns1/ns2 mesh IPs and the /88 subnet were duplicated literals in
mesh-hosts.nix. clan-core's zerotier generator already writes each machine's IP
as a public var (vars/per-machine/<m>/zerotier/zerotier-ip), so read from there
and derive the subnet from zerotier-network-id. Pure refactor: the rendered
values are identical and the system derivation hash is unchanged.
2026-06-17 11:53:56 +07:00
Berwn 8ac96b2d10 Enable IPv6 dialing for VictoriaMetrics scrapes
The scraper defaults to IPv4-only, so the ns1/ns2 mesh ULA targets were
dropped with 'no suitable address found'. -enableTCP6 lets VM scrape them.
2026-06-17 10:51:31 +07:00
Berwn 1405605eac Remove key(s) for user berwn from secrets 2026-06-17 10:29:23 +07:00
Berwn ad0c47e046 Add key(s) for user berwn to secrets 2026-06-17 10:26:55 +07:00
Berwn fb7b269f68 Update vars via generator grafana-admin for machine control 2026-06-17 10:17:45 +07:00
Berwn 33ac7e106b Add VictoriaMetrics + Grafana DNS monitoring over the mesh
control runs VictoriaMetrics (loopback) and Grafana; every machine exports
node metrics and the nameservers export Knot stats (mod-stats + knot-exporter).
Scraping and the Grafana UI ride the ZeroTier mesh only, scoped by nftables to
the mesh /88; the public side stays closed by the Hetzner cloud firewall. The
provisioned DNS dashboard includes a per-zone SOA serial table to catch
primary/secondary drift. ZeroTier ULAs are centralised in mesh-hosts.nix.
2026-06-17 10:17:27 +07:00
Berwn 63446173bc monitor.cnx.network DNS test 2026-06-16 19:03:49 +07:00
Berwn aa604bda9a Switch ns1 zone serial-policy to unixtime
dateserial (YYYYMMDDnn) only has a 2-digit same-day counter held in Knot's
journal; a journal reset restarted the counter and let ns1 mint a serial ns2
had already seen with older content, so ns2 never retransferred. unixtime is
strictly monotonic per reload, eliminating the shared-serial collision.
2026-06-16 18:59:45 +07:00
Berwn e795960dcf Configure static public IPv6 on control, ns1, ns2 2026-06-16 18:04:33 +07:00
Berwn 6783ad7c17 Add internet networking service for direct SSH to public IPs 2026-06-16 18:04:29 +07:00
Berwn a49aea3c7a vars fix 2026-06-16 16:59:54 +07:00
Berwn de7d950596 Format tree with treefmt 2026-06-16 16:53:00 +07:00
Berwn cf0d796bee Add treefmt formatter (nix fmt + flake check gate) 2026-06-16 16:53:00 +07:00
kurogeek 3302b70485 clan.core.sops.defaultGroups to all machines 2026-06-16 16:46:55 +07:00
kurogeek c85da6b8fc Add user berwn to group admins 2026-06-16 16:44:32 +07:00
kurogeek d50603743e Add user kurogeek to group admins 2026-06-16 16:44:25 +07:00
Berwn 95b9375324 Grant kurogeek admin SSH access on all machines 2026-06-16 16:30:18 +07:00
Berwn 70cbfe84b1 Add user kurogeek to secrets 2026-06-16 16:24:23 +07:00
Berwn a3482face5 Allow ACME DNS-01 dynamic updates on ns1
Add a dedicated acme_ddns TSIG key (scoped to ns1 only) and an acl_acme rule
that limits it to TXT updates at or under _acme-challenge.<zone>. An external
ACME client can now write challenge records via RFC 2136; Knot signs them and
transfers to ns2, which never holds the key.
2026-06-14 17:12:17 +07:00
Berwn 8330eaa8ce Update vars via generator dns-acme-tsig for machine ns1 2026-06-14 17:07:17 +07:00
Berwn dc51cfbdb5 Enable DNSSEC and automatic SOA serials on the DNS zones
ns1 (primary) now signs every zone with an ECDSA P-256/SHA-256 policy and
manages the SOA serial itself: zonefile-load = difference-no-serial (with
journal-content = all) plus serial-policy = dateserial let records be edited
without bumping the serial by hand. ns2 needs no change; it transfers the
already-signed zone.

Also point the ns1/ns2 AAAA glue at the public Hetzner IPv6 addresses; they
previously pointed at unroutable ZeroTier mesh ULAs.
2026-06-14 16:27:30 +07:00
Berwn 5864054b00 Move Hetzner firewall rules into a separate data file
Extract the per-firewall rule data out of control's configuration into
modules/hetzner-firewall-rules.nix, imported like the DNS domains list.
The evaluated rules are unchanged.
2026-06-14 15:49:00 +07:00
Berwn 344f432640 Add Hetzner Cloud firewall auto-sync from clan config
control runs a oneshot on each deploy that creates each firewall if
missing and replaces its rules via the Hetzner API set_rules action,
using a Read/Write token stored as a clan secret. Public SSH is not
exposed; admin access rides the ZeroTier mesh, with emergency-access as
the console fallback.
2026-06-14 15:40:05 +07:00
Berwn dbb67dbd9c Update vars via generator hetzner-firewall for machine control 2026-06-14 15:37:25 +07:00
Berwn 2506b21ffa Enable emergency-access recovery service
Add the clan-core emergency-access service on all nixos machines; it
sets a per-machine recovery root password for console login when a
machine fails to boot.
2026-06-14 15:02:34 +07:00
Berwn 306a2cf61e Set per-machine timezones and enable NTP
control and ns2 use UTC+3 (Etc/GMT-3), ns1 uses UTC+1 (Etc/GMT-1) —
fixed offsets, no DST. Make systemd-timesyncd explicit on all three.
2026-06-14 15:02:34 +07:00
Berwn 91578a2b43 Update vars via generator emergency-access for machine ns2 2026-06-14 15:00:25 +07:00
Berwn ab8288aef9 Update vars via generator emergency-access for machine ns1 2026-06-14 15:00:24 +07:00
Berwn 7b292b8279 Update vars via generator emergency-access for machine control 2026-06-14 15:00:24 +07:00
Berwn 56f0af3153 Fix knot startup on ns1/ns2: TSIG key perms and port 53 conflict
knotd runs as the "knot" user, so the shared TSIG key file needs
owner/group knot — it was root-only and knot couldn't read it.

systemd-resolved's stub listener was holding port 53, so knot's
0.0.0.0@53 / ::@53 TCP bind failed. Disable the stub (resolution
still works via nss-resolve) to free the port.
2026-06-14 14:49:10 +07:00
Berwn 9de95b4fb5 update(inventory.json): Installed ns2 2026-06-14 13:34:17 +07:00
Berwn 099383ccfa update(inventory.json): Installed ns1 2026-06-14 13:29:53 +07:00
Berwn 807785cdab Add authoritative DNS on ns1/ns2 and finalize clan config
- Knot authoritative DNS: ns1 primary, ns2 secondary serving cnx.network,
  buildfor.life and cnx.email over TSIG-secured zone transfer (modules/dns)
- Knot listens publicly + over ZeroTier; firewall opens port 53
- Complete clan inventory: name/domain, admin SSH key, control as the
  zerotier controller, tor on all nixos machines
- Enable age yubikey/fido2-hmac secret plugins
2026-06-14 13:24:23 +07:00
Berwn 9f1a2861ce Add ns2 to secret vars/shared/dns-tsig/tsig.conf 2026-06-14 13:22:43 +07:00
Berwn 2798e8e8f0 Update vars via generator dns-tsig for machine ns1 2026-06-14 13:22:39 +07:00
Berwn a40c4d1800 Set disk schema of machine: ns2 to single-disk 2026-06-14 13:19:56 +07:00
Berwn 2a0bdc4a4b Set disk schema of machine: ns1 to single-disk 2026-06-14 13:19:44 +07:00
Berwn 840b3ca407 machines/ns2/facter.json: update hardware configuration 2026-06-14 13:18:41 +07:00
Berwn d757dc3c52 machines/ns1/facter.json: update hardware configuration 2026-06-14 13:16:11 +07:00
Berwn 80a9761878 Update vars via generator zerotier for machine ns2 2026-06-14 12:36:52 +07:00
Berwn 6aa68a0e4d Update vars via generator zerotier for machine ns1 2026-06-14 12:36:51 +07:00
Berwn 67e60910be update(inventory.json): Installed control 2026-06-14 12:36:24 +07:00
Berwn bf65146a62 Set disk schema of machine: control to single-disk 2026-06-14 12:29:39 +07:00
Berwn 8938637c28 machines/control/facter.json: update hardware configuration 2026-06-14 12:27:20 +07:00