hypr - My "local" LLM setup with Hyperstack.

Age	Commit message (Collapse)	Author
14 days	fix(watch): auto-recover when default VM is dead or replaced	Paul Buetow
	- Add per-VM 10s fetch timeout so one dead VM cannot stall the dashboard - Make fallback logic check VM state (public_ip + ACTIVE status) instead of just file existence, so a stale/deleted VM1 state does not block watch - Auto-replace cached SSH host keys when a VM is recreated instead of failing - Suppress Ruby thread exception noise on killed SSH threads Fixes 'just watch' showing blank screen when VM1 is deleted but has a stale state file, and SSH host-key mismatch on VM recreation.
14 days	update hyperstack2 VM state and config after recreation	Paul Buetow

2026-05-24	fix(provisioning): recover from vLLM readiness timeout and increase poll window	Paul Buetow
	When create timed out during vLLM readiness polling (common for large models like Qwen3.6-27B-FP8), rerunning create would stop and restart the already-running container, restarting the whole startup sequence. Now the vLLM install script checks if the container is already running and serving the correct model before touching it. If it detects a healthy container, it skips the stop/pull/start cycle entirely. Also increases the readiness timeout from 20 min (240x5s) to 30 min (360x5s) to accommodate cold starts with model download and CUDA graph capture on large models.
2026-05-24	fix(cli): watch/status/test auto-detect active VMs when default VM1 is not ↵	Paul Buetow
	provisioned
2026-05-24	chore(config): remove gpt-oss-120b references since qwen3.6 is better	Paul Buetow

2026-05-24	fix(watcher): show actionable error when VM not provisioned or SSH fails	Paul Buetow

2026-05-24	replace qwen3-coder-next with qwen3.6-27b across configs, docs, and tooling	Paul Buetow

2026-05-24	feat(watch): retry SSH connection failures with exponential backoff	Paul Buetow
	Remove the vm_api_reachable? filter from run_watch so VMs that are currently booting are not silently dropped from the dashboard. Add exponential-backoff retry logic (up to 4 attempts, sleeping 2s, 4s, 8s, 16s) inside VllmWatcher#fetch_vm_stats for transient SSH/WireGuard errors such as connection refused, host unreachable, and exit 255. This lets watch automatically recover while a VM is still starting up instead of failing immediately.
2026-05-24	feat(cli): replace --config with --vm 1\|2\|both, remove create-both/delete-both	Paul Buetow
	- Drop single-VM default hyperstack-vm.toml and @config_path/@config_explicit machinery - Add global --vm flag (default: 1) mapping to hyperstack-vm1.toml and/or hyperstack-vm2.toml - Fold create-both and delete-both into create/delete --vm both - Teach status, watch, test, model to accept --vm (default: 1) - Update help text and README/AGENTS/fish abbreviations accordingly
2026-05-24	cleanup: remove ComfyUI and photo-related code from lib/hyperstack	Paul Buetow

2026-05-24	fix(provisioning): narrow ComfyUI install chmod to models_dir and ↵	Paul Buetow
	output_dir\n\nThe comfyui_install_script previously ran\n\n chmod -R 0777 File.dirname(models_dir)\n\nwhich chmods the parent directory (e.g. /ephemeral). If models_dir\nis configured directly under /ephemeral that gives world-write access to\nall sibling directories (vLLM hug cache, Ollama models, etc.).\n\nNow chmod only the two directories that actually need it: models_dir\nand output_dir.
2026-05-24	fix(wireguard): handle leading whitespace in /etc/hosts lines	Paul Buetow
	In prune_host_line, body.split(/\s+/) on a line with leading whitespace produced tokens starting with an empty string, which was then shifted into as ''. This caused the rewritten /etc/hosts entry to lose its IP silently. Fix by stripping the body before splitting: body.strip.split(/\s+/). Refs: hc
2026-05-24	fix(cli): avoid false VM2 abort when VM1 fails after WG step succeeded	Paul Buetow
	In run_create_both, VM1's thread rescue unconditionally set vm1_wg_state[:error], even when the WireGuard step had already signaled success (vm1_wg_state[:done] = true). If VM2 was waiting on the condition variable at that moment, it would raise 'VM1 WireGuard setup failed' and abort needlessly. Now the rescue only sets :error when :done is still false, so a downstream VM1 failure (e.g. vLLM install) no longer leaks to VM2. Resolves agent task ic.
2026-05-24	fix(watcher): remove Timeout.timeout to prevent orphaning SSH child processes	Paul Buetow
	Replace Timeout.timeout(15) around Open3.capture3 with SSH-level keepalive options (ServerAliveInterval=5, ServerAliveCountMax=3). Ruby's Timeout raises in a background thread but leaves the ssh process running; SSH's own timeouts self-terminate cleanly.
2026-05-24	fix(cli): synchronize access to errors hash in run_create_both	Paul Buetow

2026-05-24	fix(provisioning): chown models_dir itself, not its parent	Paul Buetow

2026-05-24	fix(config): memoize detected_operator_cidr failure to avoid repeated probes	Paul Buetow
	When all public IP probes fail (network down, DNS broken), detect_public_operator_cidr raises HyperstackVM::Error. The old code did not cache this failure, so every call to resolved_allowed_cidrs re-ran all probes, compounding slowness. Add a rescue block in detected_operator_cidr that stores the exception in @detected_operator_cidr_error and re-raises it. On subsequent calls the cached error is re-raised immediately, preventing redundant probe retries.
2026-05-24	fix(manager): only delete state file when VM deletion is confirmed	Paul Buetow
	Ensure Manager#delete does not wipe the state file on generic/transient API failures. The rescue now checks whether the error message indicates the VM is already gone (404, not_found, does not exist) before removing state. This prevents orphaned billable VMs after exhausted retries or transient network errors.
2026-04-11	Rename VM1 configs: default hyperstack-vm1.toml, Nemotron in -nemotron	Paul Buetow
	Move the former hyperstack-vm1-coder.toml to hyperstack-vm1.toml as the standard VM1 profile (Qwen3-Coder-Next on single GPU). Preserve the dual-H100 Nemotron-3-Super stack as hyperstack-vm1-nemotron.toml. Point create-both at hyperstack-vm1.toml and refresh README for current defaults. Made-with: Cursor
2026-04-06	provisioner: support docker_image and pre_start_cmd for Gemma 4 startup	Paul Buetow
	Adds docker_image and pre_start_cmd config fields to config.rb and provisioning.rb so the Gemma 4 31B workarounds are baked in: - docker_image = "vllm/vllm-openai:nightly" (stable lacks Gemma 4 support) - pre_start_cmd = "pip install -q transformers==5.5.0" (stable pins <5) - extra_docker_env = ["CUDA_VISIBLE_DEVICES=0"] (required with --entrypoint bash) When pre_start_cmd is set, the provisioner switches to --entrypoint bash and chains the patch command before launching vLLM, so create-both works end-to-end without manual container replacement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06	hyperstack: switch to Gemma 4 31B on VM2, Qwen3-Coder-Next on VM1	Paul Buetow
	VM1 (hyperstack-vm1-coder.toml, renamed from hyperstack-vm1-gptoss.toml): - Default model switched from gpt-oss-120b to qwen3-coder-next - Config file renamed to reflect actual default model VM2 (hyperstack-vm2.toml): - Default model switched from qwen3-coder-next to Gemma 4 31B AWQ - Uses vLLM nightly image + transformers==5.5.0 workaround: Gemma 4 architecture is registered in transformers 5.x but vLLM stable pins <5 - max_model_len=131072 (128K context); KV cache fills ~95% of H100-80GB VRAM - Added gemma4-31b preset watcher.rb: - Add loading_status field to VmSnapshot to show live model-load progress (last relevant log line during startup instead of generic "loading" message) - fetch_vm_stats now captures both Engine 0 stats and loading-phase log lines in a single SSH call using a shell variable to avoid two docker log invocations - clean_log_line() strips vLLM PID/timestamp prefix for readable display cli.rb: update all hardcoded hyperstack-vm1-gptoss.toml references to hyperstack-vm1-coder.toml hypr.fish: replace pi-hyperstack-nemotron with pi-hyperstack-coder (VM1), add pi-hyperstack-gemma4 (VM2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26	hyperstack: watch only polls VMs whose API port is currently reachable	Paul Buetow
	Add watch_config_loaders that filters status_config_loaders results with a TCP probe on each VM's WireGuard inference port. VMs with stale state files (deleted from the console without `hyperstack.rb delete`) are excluded from the watch loop. Falls back to all tracked loaders when none are reachable so the watcher can still render error output when WireGuard is down. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26	hyperstack: fix TOML paths, add live provisioning progress, and auto ↵	Paul Buetow
	end-to-end test on create - cli: introduce REPO_ROOT constant so create-both/delete-both/watch find TOML configs at the repo root instead of lib/hyperstack/ - manager: with_polling prints a heartbeat every 30s so silent waits (SSH, VM ready, etc.) are visibly alive - provisioning: bootstrap_guest streams SSH output in real time so apt-lock waits and setup steps are visible as they happen - provisioning: vLLM wait loop reads docker logs to show the current startup stage (shard loading %, torch.compile, CUDA graphs, API up) instead of a plain "not ready yet" counter - manager: create automatically runs the end-to-end inference test after provisioning completes, removing the manual 'test' step Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-25	hyperstack: split 3335-line monolith into lib/hyperstack/ modules	Paul Buetow
	Extracts all classes from hyperstack.rb into focused library files: - lib/hyperstack/config.rb — ConfigLoader + Config (TOML loading, validation) - lib/hyperstack/state.rb — StateStore + PrefixedOutput (JSON state, threaded output) - lib/hyperstack/client.rb — HyperstackClient (REST API + retry logic) - lib/hyperstack/wireguard.rb — LocalWireGuard (wg1.conf peer management, /etc/hosts) - lib/hyperstack/provisioning.rb — ProvisioningScripts + RemoteProvisioner (SSH bootstrap) - lib/hyperstack/manager.rb — Manager (VM lifecycle orchestration) - lib/hyperstack/watcher.rb — VllmWatcher (Prometheus + GPU dashboard) - lib/hyperstack/cli.rb — CLI (OptionParser command dispatch) hyperstack.rb becomes a 46-line entry point with require_relative calls. All files pass `ruby -c` syntax check and `--help` runs correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>