summaryrefslogtreecommitdiff
path: root/frontends/scripts
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-05-10 10:28:08 +0300
committerPaul Buetow <paul@buetow.org>2026-05-10 10:28:08 +0300
commitd6b8e0fab3777d887e0abc7b152580a169579785 (patch)
treeffd0e374cb63967608a5866a1f566cbc9ef7b1c0 /frontends/scripts
parent425c5fa03b5d2cb44470c70a8d976ca253d662e3 (diff)
nfs-mount-monitor: add write-probe to detect 'reads OK, writes hang' state
Stunnel-wrapped NFSv4 can enter a half-broken state where mountpoint(1) returns true and stat(1) completes from cache, but ALL writes hang indefinitely. This was observed on r2 on 2026-05-10 causing navidrome to be unschedulable. The existing two probes passed while writes were dead. Add a third probe (write-probe) after the stat probe: write the shell's PID to a per-host .healthcheck.<hostname> file and immediately remove it, wrapped in a 5-second timeout. The per-host filename prevents r0/r1/r2 from racing on the same file. 5s gives one full NFS retransmit window (timeo=10 deciseconds = 1s, retrans=2) plus margin without making the 10-second timer run too long. Deployed to r0/r1/r2 via rex nfs_mount_monitor; all three nodes confirmed running the new script (journalctl shows clean exits). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'frontends/scripts')
0 files changed, 0 insertions, 0 deletions