| Age | Commit message (Collapse) | Author |
|
- Add per-VM 10s fetch timeout so one dead VM cannot stall the dashboard
- Make fallback logic check VM state (public_ip + ACTIVE status) instead of
just file existence, so a stale/deleted VM1 state does not block watch
- Auto-replace cached SSH host keys when a VM is recreated instead of failing
- Suppress Ruby thread exception noise on killed SSH threads
Fixes 'just watch' showing blank screen when VM1 is deleted but has a stale
state file, and SSH host-key mismatch on VM recreation.
|
|
|
|
When create timed out during vLLM readiness polling (common for large
models like Qwen3.6-27B-FP8), rerunning create would stop and restart
the already-running container, restarting the whole startup sequence.
Now the vLLM install script checks if the container is already running
and serving the correct model before touching it. If it detects a
healthy container, it skips the stop/pull/start cycle entirely.
Also increases the readiness timeout from 20 min (240x5s) to 30 min
(360x5s) to accommodate cold starts with model download and CUDA graph
capture on large models.
|
|
provisioned
|
|
|
|
|
|
|
|
Remove the vm_api_reachable? filter from run_watch so VMs that are
currently booting are not silently dropped from the dashboard.
Add exponential-backoff retry logic (up to 4 attempts, sleeping
2s, 4s, 8s, 16s) inside VllmWatcher#fetch_vm_stats for transient
SSH/WireGuard errors such as connection refused, host unreachable,
and exit 255. This lets watch automatically recover while a VM
is still starting up instead of failing immediately.
|
|
- Drop single-VM default hyperstack-vm.toml and @config_path/@config_explicit machinery
- Add global --vm flag (default: 1) mapping to hyperstack-vm1.toml and/or hyperstack-vm2.toml
- Fold create-both and delete-both into create/delete --vm both
- Teach status, watch, test, model to accept --vm (default: 1)
- Update help text and README/AGENTS/fish abbreviations accordingly
|
|
|
|
output_dir\n\nThe comfyui_install_script previously ran\n\n chmod -R 0777 File.dirname(models_dir)\n\nwhich chmods the *parent* directory (e.g. /ephemeral). If models_dir\nis configured directly under /ephemeral that gives world-write access to\nall sibling directories (vLLM hug cache, Ollama models, etc.).\n\nNow chmod only the two directories that actually need it: models_dir\nand output_dir.
|
|
In prune_host_line, body.split(/\s+/) on a line with leading whitespace
produced tokens starting with an empty string, which was then shifted
into as ''. This caused the rewritten /etc/hosts entry to lose
its IP silently.
Fix by stripping the body before splitting: body.strip.split(/\s+/).
Refs: hc
|
|
In run_create_both, VM1's thread rescue unconditionally set
vm1_wg_state[:error], even when the WireGuard step had already
signaled success (vm1_wg_state[:done] = true). If VM2 was
waiting on the condition variable at that moment, it would raise
'VM1 WireGuard setup failed' and abort needlessly.
Now the rescue only sets :error when :done is still false, so a
downstream VM1 failure (e.g. vLLM install) no longer leaks to VM2.
Resolves agent task ic.
|
|
Replace Timeout.timeout(15) around Open3.capture3 with SSH-level
keepalive options (ServerAliveInterval=5, ServerAliveCountMax=3).
Ruby's Timeout raises in a background thread but leaves the ssh
process running; SSH's own timeouts self-terminate cleanly.
|
|
|
|
|
|
When all public IP probes fail (network down, DNS broken), detect_public_operator_cidr
raises HyperstackVM::Error. The old code did not cache this failure, so every
call to resolved_allowed_cidrs re-ran all probes, compounding slowness.
Add a rescue block in detected_operator_cidr that stores the exception in
@detected_operator_cidr_error and re-raises it. On subsequent calls the cached
error is re-raised immediately, preventing redundant probe retries.
|
|
Ensure Manager#delete does not wipe the state file on generic/transient API failures. The rescue now checks whether the error message indicates the VM is already gone (404, not_found, does not exist) before removing state. This prevents orphaned billable VMs after exhausted retries or transient network errors.
|
|
Move the former hyperstack-vm1-coder.toml to hyperstack-vm1.toml as the
standard VM1 profile (Qwen3-Coder-Next on single GPU). Preserve the
dual-H100 Nemotron-3-Super stack as hyperstack-vm1-nemotron.toml. Point
create-both at hyperstack-vm1.toml and refresh README for current defaults.
Made-with: Cursor
|
|
Adds docker_image and pre_start_cmd config fields to config.rb and
provisioning.rb so the Gemma 4 31B workarounds are baked in:
- docker_image = "vllm/vllm-openai:nightly" (stable lacks Gemma 4 support)
- pre_start_cmd = "pip install -q transformers==5.5.0" (stable pins <5)
- extra_docker_env = ["CUDA_VISIBLE_DEVICES=0"] (required with --entrypoint bash)
When pre_start_cmd is set, the provisioner switches to --entrypoint bash and
chains the patch command before launching vLLM, so create-both works end-to-end
without manual container replacement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
VM1 (hyperstack-vm1-coder.toml, renamed from hyperstack-vm1-gptoss.toml):
- Default model switched from gpt-oss-120b to qwen3-coder-next
- Config file renamed to reflect actual default model
VM2 (hyperstack-vm2.toml):
- Default model switched from qwen3-coder-next to Gemma 4 31B AWQ
- Uses vLLM nightly image + transformers==5.5.0 workaround: Gemma 4
architecture is registered in transformers 5.x but vLLM stable pins <5
- max_model_len=131072 (128K context); KV cache fills ~95% of H100-80GB VRAM
- Added gemma4-31b preset
watcher.rb:
- Add loading_status field to VmSnapshot to show live model-load progress
(last relevant log line during startup instead of generic "loading" message)
- fetch_vm_stats now captures both Engine 0 stats and loading-phase log lines
in a single SSH call using a shell variable to avoid two docker log invocations
- clean_log_line() strips vLLM PID/timestamp prefix for readable display
cli.rb: update all hardcoded hyperstack-vm1-gptoss.toml references to
hyperstack-vm1-coder.toml
hypr.fish: replace pi-hyperstack-nemotron with pi-hyperstack-coder (VM1),
add pi-hyperstack-gemma4 (VM2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Add watch_config_loaders that filters status_config_loaders results with
a TCP probe on each VM's WireGuard inference port. VMs with stale state
files (deleted from the console without `hyperstack.rb delete`) are
excluded from the watch loop. Falls back to all tracked loaders when
none are reachable so the watcher can still render error output when
WireGuard is down.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
end-to-end test on create
- cli: introduce REPO_ROOT constant so create-both/delete-both/watch
find TOML configs at the repo root instead of lib/hyperstack/
- manager: with_polling prints a heartbeat every 30s so silent waits
(SSH, VM ready, etc.) are visibly alive
- provisioning: bootstrap_guest streams SSH output in real time so
apt-lock waits and setup steps are visible as they happen
- provisioning: vLLM wait loop reads docker logs to show the current
startup stage (shard loading %, torch.compile, CUDA graphs, API up)
instead of a plain "not ready yet" counter
- manager: create automatically runs the end-to-end inference test
after provisioning completes, removing the manual 'test' step
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Extracts all classes from hyperstack.rb into focused library files:
- lib/hyperstack/config.rb — ConfigLoader + Config (TOML loading, validation)
- lib/hyperstack/state.rb — StateStore + PrefixedOutput (JSON state, threaded output)
- lib/hyperstack/client.rb — HyperstackClient (REST API + retry logic)
- lib/hyperstack/wireguard.rb — LocalWireGuard (wg1.conf peer management, /etc/hosts)
- lib/hyperstack/provisioning.rb — ProvisioningScripts + RemoteProvisioner (SSH bootstrap)
- lib/hyperstack/manager.rb — Manager (VM lifecycle orchestration)
- lib/hyperstack/watcher.rb — VllmWatcher (Prometheus + GPU dashboard)
- lib/hyperstack/cli.rb — CLI (OptionParser command dispatch)
hyperstack.rb becomes a 46-line entry point with require_relative calls.
All files pass `ruby -c` syntax check and `--help` runs correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|