diff options
| author | Paul Buetow <paul@buetow.org> | 2026-05-24 13:48:42 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-05-24 13:48:42 +0300 |
| commit | f16f4b753b3bf317e6da79f479ff5f506ed34b47 (patch) | |
| tree | e2c71514677aac0cd7cd85bfc28032d37e9bd55d /logo.svg | |
| parent | 24c7bfa60448c74dff6e21010ac0b98c19be7c04 (diff) | |
feat(watch): retry SSH connection failures with exponential backoff
Remove the vm_api_reachable? filter from run_watch so VMs that are
currently booting are not silently dropped from the dashboard.
Add exponential-backoff retry logic (up to 4 attempts, sleeping
2s, 4s, 8s, 16s) inside VllmWatcher#fetch_vm_stats for transient
SSH/WireGuard errors such as connection refused, host unreachable,
and exit 255. This lets watch automatically recover while a VM
is still starting up instead of failing immediately.
Diffstat (limited to 'logo.svg')
0 files changed, 0 insertions, 0 deletions
