diff options
| author | Paul Buetow <paul@buetow.org> | 2026-05-10 10:30:55 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-05-10 10:30:55 +0300 |
| commit | 3964965c8ad5eeee16d3338ded718bbd34e1c69d (patch) | |
| tree | a1d2e6ef050b8d132cf127800851492a98da14cc /frontends/scripts | |
| parent | d6b8e0fab3777d887e0abc7b152580a169579785 (diff) | |
nfs-mount-monitor: strengthen fix_mount recovery sequence
Add lazy umount fallback, D-state process killer, stunnel restart, and
60-second hard deadline to prevent fix_mount from looping forever when
processes are stuck in D state on a stale NFSv4-over-stunnel mount.
Recovery sequence is now:
1. mount -o remount -f (cheap, no disruption)
2. kill_pinning_processes (SIGKILL D-state procs with nfs_ wchan)
3. umount -f (force unmount)
4. umount -l (lazy detach VFS node if -f failed)
5. systemctl restart stunnel + 2s sleep (refresh TLS transport)
6. mount (fresh mount)
The 60s deadline uses bash $SECONDS so fix_mount can never outlast its
own 10-second timer interval by an unbounded amount. Deployed to all
three r-nodes (r0/r1/r2) via rex nfs_mount_monitor.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'frontends/scripts')
0 files changed, 0 insertions, 0 deletions
