summaryrefslogtreecommitdiff
path: root/hyperstack-vm2.toml
AgeCommit message (Collapse)Author
2026-05-25update hyperstack2 VM state and config after recreationPaul Buetow
2026-05-24chore(config): revert vm2 default to n3-A100x1; simplify justfilePaul Buetow
2026-05-24chore(vm2): H100 provisioning, L40 plan, and H100-specific vLLM tuningPaul Buetow
2026-05-24chore(config): remove gpt-oss-120b references since qwen3.6 is betterPaul Buetow
2026-05-24replace qwen3-coder-next with qwen3.6-27b across configs, docs, and toolingPaul Buetow
2026-05-24feat(cli): replace --config with --vm 1|2|both, remove create-both/delete-bothPaul Buetow
- Drop single-VM default hyperstack-vm.toml and @config_path/@config_explicit machinery - Add global --vm flag (default: 1) mapping to hyperstack-vm1.toml and/or hyperstack-vm2.toml - Fold create-both and delete-both into create/delete --vm both - Teach status, watch, test, model to accept --vm (default: 1) - Update help text and README/AGENTS/fish abbreviations accordingly
2026-04-24task 78: make Qwen3.6-27B the VM2 defaultPaul Buetow
2026-04-11Rename VM1 configs: default hyperstack-vm1.toml, Nemotron in -nemotronPaul Buetow
Move the former hyperstack-vm1-coder.toml to hyperstack-vm1.toml as the standard VM1 profile (Qwen3-Coder-Next on single GPU). Preserve the dual-H100 Nemotron-3-Super stack as hyperstack-vm1-nemotron.toml. Point create-both at hyperstack-vm1.toml and refresh README for current defaults. Made-with: Cursor
2026-04-06provisioner: support docker_image and pre_start_cmd for Gemma 4 startupPaul Buetow
Adds docker_image and pre_start_cmd config fields to config.rb and provisioning.rb so the Gemma 4 31B workarounds are baked in: - docker_image = "vllm/vllm-openai:nightly" (stable lacks Gemma 4 support) - pre_start_cmd = "pip install -q transformers==5.5.0" (stable pins <5) - extra_docker_env = ["CUDA_VISIBLE_DEVICES=0"] (required with --entrypoint bash) When pre_start_cmd is set, the provisioner switches to --entrypoint bash and chains the patch command before launching vLLM, so create-both works end-to-end without manual container replacement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06hyperstack: switch to Gemma 4 31B on VM2, Qwen3-Coder-Next on VM1Paul Buetow
VM1 (hyperstack-vm1-coder.toml, renamed from hyperstack-vm1-gptoss.toml): - Default model switched from gpt-oss-120b to qwen3-coder-next - Config file renamed to reflect actual default model VM2 (hyperstack-vm2.toml): - Default model switched from qwen3-coder-next to Gemma 4 31B AWQ - Uses vLLM nightly image + transformers==5.5.0 workaround: Gemma 4 architecture is registered in transformers 5.x but vLLM stable pins <5 - max_model_len=131072 (128K context); KV cache fills ~95% of H100-80GB VRAM - Added gemma4-31b preset watcher.rb: - Add loading_status field to VmSnapshot to show live model-load progress (last relevant log line during startup instead of generic "loading" message) - fetch_vm_stats now captures both Engine 0 stats and loading-phase log lines in a single SSH call using a shell variable to avoid two docker log invocations - clean_log_line() strips vLLM PID/timestamp prefix for readable display cli.rb: update all hardcoded hyperstack-vm1-gptoss.toml references to hyperstack-vm1-coder.toml hypr.fish: replace pi-hyperstack-nemotron with pi-hyperstack-coder (VM1), add pi-hyperstack-gemma4 (VM2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24gpt-oss-120b: enable reasoning via openai_gptoss parserPaul Buetow
- Add --reasoning-parser openai_gptoss to gpt-oss-120b vLLM config in all three toml files; extracts <|channel|>analysis thinking blocks into reasoning_content in API responses - Mark gpt-oss-120b as reasoning: true in pi/agent/models.json for all three providers (hyperstack, hyperstack1, hyperstack2) - Update vm1 state file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21Fix Nemotron OOM; add VM lifecycle fish abbrs; document automated setupPaul Buetow
- hyperstack-vm1/vm2.toml: reduce nemotron-super max_model_len 262144→131072 and add --enforce-eager to disable CUDA graph capture (~3-4 GB overhead). Nemotron 120B weights (~60 GB) leave too little VRAM headroom for KV cache allocation and CUDA graph buffers at 262K context on a single A100 80GB. 131K context with eager mode is stable. README VRAM table updated to match. - hyperstack.fish: add hyperstack-create/delete/test and hyperstack-create/delete-both abbreviations for VM lifecycle management alongside the existing pi-* aliases. - README.md: add "Automated setup reference" section with single-VM and two-VM command flows before the manual vLLM Docker setup section. End-to-end tested: single VM (GPT-OSS 120B), dual VM (Nemotron + Qwen3-Coder), pi queries on all three models — all passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21Remove LiteLLM and Claude Code repo references (task 301)Paul Buetow
2026-03-21initial importPaul Buetow