summaryrefslogtreecommitdiff
path: root/lib/hyperstack/cli.rb
AgeCommit message (Collapse)Author
14 daysfix(watch): auto-recover when default VM is dead or replacedPaul Buetow
- Add per-VM 10s fetch timeout so one dead VM cannot stall the dashboard - Make fallback logic check VM state (public_ip + ACTIVE status) instead of just file existence, so a stale/deleted VM1 state does not block watch - Auto-replace cached SSH host keys when a VM is recreated instead of failing - Suppress Ruby thread exception noise on killed SSH threads Fixes 'just watch' showing blank screen when VM1 is deleted but has a stale state file, and SSH host-key mismatch on VM recreation.
2026-05-24fix(cli): watch/status/test auto-detect active VMs when default VM1 is not ↵Paul Buetow
provisioned
2026-05-24feat(watch): retry SSH connection failures with exponential backoffPaul Buetow
Remove the vm_api_reachable? filter from run_watch so VMs that are currently booting are not silently dropped from the dashboard. Add exponential-backoff retry logic (up to 4 attempts, sleeping 2s, 4s, 8s, 16s) inside VllmWatcher#fetch_vm_stats for transient SSH/WireGuard errors such as connection refused, host unreachable, and exit 255. This lets watch automatically recover while a VM is still starting up instead of failing immediately.
2026-05-24feat(cli): replace --config with --vm 1|2|both, remove create-both/delete-bothPaul Buetow
- Drop single-VM default hyperstack-vm.toml and @config_path/@config_explicit machinery - Add global --vm flag (default: 1) mapping to hyperstack-vm1.toml and/or hyperstack-vm2.toml - Fold create-both and delete-both into create/delete --vm both - Teach status, watch, test, model to accept --vm (default: 1) - Update help text and README/AGENTS/fish abbreviations accordingly
2026-05-24cleanup: remove ComfyUI and photo-related code from lib/hyperstackPaul Buetow
2026-05-24fix(cli): avoid false VM2 abort when VM1 fails after WG step succeededPaul Buetow
In run_create_both, VM1's thread rescue unconditionally set vm1_wg_state[:error], even when the WireGuard step had already signaled success (vm1_wg_state[:done] = true). If VM2 was waiting on the condition variable at that moment, it would raise 'VM1 WireGuard setup failed' and abort needlessly. Now the rescue only sets :error when :done is still false, so a downstream VM1 failure (e.g. vLLM install) no longer leaks to VM2. Resolves agent task ic.
2026-05-24fix(cli): synchronize access to errors hash in run_create_bothPaul Buetow
2026-04-11Rename VM1 configs: default hyperstack-vm1.toml, Nemotron in -nemotronPaul Buetow
Move the former hyperstack-vm1-coder.toml to hyperstack-vm1.toml as the standard VM1 profile (Qwen3-Coder-Next on single GPU). Preserve the dual-H100 Nemotron-3-Super stack as hyperstack-vm1-nemotron.toml. Point create-both at hyperstack-vm1.toml and refresh README for current defaults. Made-with: Cursor
2026-04-06hyperstack: switch to Gemma 4 31B on VM2, Qwen3-Coder-Next on VM1Paul Buetow
VM1 (hyperstack-vm1-coder.toml, renamed from hyperstack-vm1-gptoss.toml): - Default model switched from gpt-oss-120b to qwen3-coder-next - Config file renamed to reflect actual default model VM2 (hyperstack-vm2.toml): - Default model switched from qwen3-coder-next to Gemma 4 31B AWQ - Uses vLLM nightly image + transformers==5.5.0 workaround: Gemma 4 architecture is registered in transformers 5.x but vLLM stable pins <5 - max_model_len=131072 (128K context); KV cache fills ~95% of H100-80GB VRAM - Added gemma4-31b preset watcher.rb: - Add loading_status field to VmSnapshot to show live model-load progress (last relevant log line during startup instead of generic "loading" message) - fetch_vm_stats now captures both Engine 0 stats and loading-phase log lines in a single SSH call using a shell variable to avoid two docker log invocations - clean_log_line() strips vLLM PID/timestamp prefix for readable display cli.rb: update all hardcoded hyperstack-vm1-gptoss.toml references to hyperstack-vm1-coder.toml hypr.fish: replace pi-hyperstack-nemotron with pi-hyperstack-coder (VM1), add pi-hyperstack-gemma4 (VM2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26hyperstack: watch only polls VMs whose API port is currently reachablePaul Buetow
Add watch_config_loaders that filters status_config_loaders results with a TCP probe on each VM's WireGuard inference port. VMs with stale state files (deleted from the console without `hyperstack.rb delete`) are excluded from the watch loop. Falls back to all tracked loaders when none are reachable so the watcher can still render error output when WireGuard is down. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26hyperstack: fix TOML paths, add live provisioning progress, and auto ↵Paul Buetow
end-to-end test on create - cli: introduce REPO_ROOT constant so create-both/delete-both/watch find TOML configs at the repo root instead of lib/hyperstack/ - manager: with_polling prints a heartbeat every 30s so silent waits (SSH, VM ready, etc.) are visibly alive - provisioning: bootstrap_guest streams SSH output in real time so apt-lock waits and setup steps are visible as they happen - provisioning: vLLM wait loop reads docker logs to show the current startup stage (shard loading %, torch.compile, CUDA graphs, API up) instead of a plain "not ready yet" counter - manager: create automatically runs the end-to-end inference test after provisioning completes, removing the manual 'test' step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25hyperstack: split 3335-line monolith into lib/hyperstack/ modulesPaul Buetow
Extracts all classes from hyperstack.rb into focused library files: - lib/hyperstack/config.rb — ConfigLoader + Config (TOML loading, validation) - lib/hyperstack/state.rb — StateStore + PrefixedOutput (JSON state, threaded output) - lib/hyperstack/client.rb — HyperstackClient (REST API + retry logic) - lib/hyperstack/wireguard.rb — LocalWireGuard (wg1.conf peer management, /etc/hosts) - lib/hyperstack/provisioning.rb — ProvisioningScripts + RemoteProvisioner (SSH bootstrap) - lib/hyperstack/manager.rb — Manager (VM lifecycle orchestration) - lib/hyperstack/watcher.rb — VllmWatcher (Prometheus + GPU dashboard) - lib/hyperstack/cli.rb — CLI (OptionParser command dispatch) hyperstack.rb becomes a 46-line entry point with require_relative calls. All files pass `ruby -c` syntax check and `--help` runs correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>