Bind VictoriaLogs on IPv6 so the mesh can ship journald to it

VictoriaLogs, like the VM scraper, is IPv4-only by default: ":9428" binds
0.0.0.0 only, so ns1/ns2 pushing journald over the IPv6 mesh got "connection
refused" while control's own loopback (v4) upload worked. Add -enableTCP6 so it
binds [::] (dual-stack), matching the flag already used for the scraper.

Also simplify the systemd-journal-upload override to just startLimitIntervalSec=0
(retry forever / self-heal) and drop the SuccessExitStatus masking: a persistent
sink failure should stay loud rather than be hidden behind a green deploy.
This commit is contained in:
Berwn
2026-06-17 17:27:56 +07:00
parent 0eb883061b
commit 9c8a2abf3f
3 changed files with 22 additions and 6 deletions
+8
View File
@@ -56,6 +56,14 @@ systemd's own `services.journald.upload` → the `/insert/journald` endpoint
loopback so its logs survive a mesh outage, `ns1`/`ns2` push over the mesh, and loopback so its logs survive a mesh outage, `ns1`/`ns2` push over the mesh, and
9428 is firewall-scoped to the mesh like everything else. 9428 is firewall-scoped to the mesh like everything else.
> Same IPv4-only default as the scraper: VictoriaLogs binds `0.0.0.0:9428` for a
> bare `:9428`, so mesh (IPv6) pushes from ns1/ns2 are refused until you pass
> `extraOptions = [ "-enableTCP6" ]` (binds `[::]`). Verify the bind on `control`:
>
> ```
> ss -tlnp | grep 9428 # want [::]:9428, not 0.0.0.0:9428
> ```
Query logs from Grafana via the provisioned **VictoriaLogs** datasource (Explore Query logs from Grafana via the provisioned **VictoriaLogs** datasource (Explore
view, LogsQL), or directly in the built-in UI at `http://[control]:9428/select/vmui`. view, LogsQL), or directly in the built-in UI at `http://[control]:9428/select/vmui`.
Logs are tagged with `_HOSTNAME` and `_SYSTEMD_UNIT`, so to follow one service Logs are tagged with `_HOSTNAME` and `_SYSTEMD_UNIT`, so to follow one service
+6 -5
View File
@@ -103,11 +103,12 @@ in
"http://${dest}/insert/journald"; "http://${dest}/insert/journald";
}; };
# systemd-journal-upload exits if the sink is unreachable. The upstream module # systemd-journal-upload exits if the sink is unreachable. Upstream already
# already sets Restart=always/RestartSec=3sec, but the default start-rate limit # restarts it (Restart=always/RestartSec=3sec), but the default start-rate limit
# (5 tries / 10s) still lets the unit give up permanently and fail a deploy when # (5 tries / 10s) lets it give up permanently — so a transient VictoriaLogs
# VictoriaLogs is briefly down. Logging is best-effort: disable the limit so it # outage leaves the uploader dead until the next deploy. Disable the limit so it
# retries forever instead of wedging the host (or switch-to-configuration). # retries forever and self-heals once the sink returns. (A persistent failure
# still surfaces loudly in a deploy, which is what we want.)
systemd.services.systemd-journal-upload.startLimitIntervalSec = 0; systemd.services.systemd-journal-upload.startLimitIntervalSec = 0;
# Scrape ports reachable only from the ZeroTier mesh. # Scrape ports reachable only from the ZeroTier mesh.
+8 -1
View File
@@ -69,7 +69,14 @@ in
services.victorialogs = { services.victorialogs = {
enable = true; enable = true;
listenAddress = ":${toString logsPort}"; listenAddress = ":${toString logsPort}";
extraOptions = [ "-retentionPeriod=30d" ]; # -enableTCP6: like the scraper above, VictoriaLogs is IPv4-only by default
# for *listening* too — ":9428" binds 0.0.0.0 only, so ns1/ns2 pushing over
# the IPv6 mesh get "connection refused". This makes it bind [::] (dual-stack)
# so the mesh can reach it. Retention has no dedicated NixOS option.
extraOptions = [
"-retentionPeriod=30d"
"-enableTCP6"
];
}; };
# Admin password generated once and stored as a clan secret. Retrieve with: # Admin password generated once and stored as a clan secret. Retrieve with: