| Age | Commit message (Collapse) | Author |
|
When create timed out during vLLM readiness polling (common for large
models like Qwen3.6-27B-FP8), rerunning create would stop and restart
the already-running container, restarting the whole startup sequence.
Now the vLLM install script checks if the container is already running
and serving the correct model before touching it. If it detects a
healthy container, it skips the stop/pull/start cycle entirely.
Also increases the readiness timeout from 20 min (240x5s) to 30 min
(360x5s) to accommodate cold starts with model download and CUDA graph
capture on large models.
|
|
|
|
|
|
output_dir\n\nThe comfyui_install_script previously ran\n\n chmod -R 0777 File.dirname(models_dir)\n\nwhich chmods the *parent* directory (e.g. /ephemeral). If models_dir\nis configured directly under /ephemeral that gives world-write access to\nall sibling directories (vLLM hug cache, Ollama models, etc.).\n\nNow chmod only the two directories that actually need it: models_dir\nand output_dir.
|
|
|
|
Adds docker_image and pre_start_cmd config fields to config.rb and
provisioning.rb so the Gemma 4 31B workarounds are baked in:
- docker_image = "vllm/vllm-openai:nightly" (stable lacks Gemma 4 support)
- pre_start_cmd = "pip install -q transformers==5.5.0" (stable pins <5)
- extra_docker_env = ["CUDA_VISIBLE_DEVICES=0"] (required with --entrypoint bash)
When pre_start_cmd is set, the provisioner switches to --entrypoint bash and
chains the patch command before launching vLLM, so create-both works end-to-end
without manual container replacement.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
end-to-end test on create
- cli: introduce REPO_ROOT constant so create-both/delete-both/watch
find TOML configs at the repo root instead of lib/hyperstack/
- manager: with_polling prints a heartbeat every 30s so silent waits
(SSH, VM ready, etc.) are visibly alive
- provisioning: bootstrap_guest streams SSH output in real time so
apt-lock waits and setup steps are visible as they happen
- provisioning: vLLM wait loop reads docker logs to show the current
startup stage (shard loading %, torch.compile, CUDA graphs, API up)
instead of a plain "not ready yet" counter
- manager: create automatically runs the end-to-end inference test
after provisioning completes, removing the manual 'test' step
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
|
Extracts all classes from hyperstack.rb into focused library files:
- lib/hyperstack/config.rb — ConfigLoader + Config (TOML loading, validation)
- lib/hyperstack/state.rb — StateStore + PrefixedOutput (JSON state, threaded output)
- lib/hyperstack/client.rb — HyperstackClient (REST API + retry logic)
- lib/hyperstack/wireguard.rb — LocalWireGuard (wg1.conf peer management, /etc/hosts)
- lib/hyperstack/provisioning.rb — ProvisioningScripts + RemoteProvisioner (SSH bootstrap)
- lib/hyperstack/manager.rb — Manager (VM lifecycle orchestration)
- lib/hyperstack/watcher.rb — VllmWatcher (Prometheus + GPU dashboard)
- lib/hyperstack/cli.rb — CLI (OptionParser command dispatch)
hyperstack.rb becomes a 46-line entry point with require_relative calls.
All files pass `ruby -c` syntax check and `--help` runs correctly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|