hypr - My "local" LLM setup with Hyperstack.

Age	Commit message (Collapse)	Author
2026-04-06	hyperstack: switch to Gemma 4 31B on VM2, Qwen3-Coder-Next on VM1	Paul Buetow
	VM1 (hyperstack-vm1-coder.toml, renamed from hyperstack-vm1-gptoss.toml): - Default model switched from gpt-oss-120b to qwen3-coder-next - Config file renamed to reflect actual default model VM2 (hyperstack-vm2.toml): - Default model switched from qwen3-coder-next to Gemma 4 31B AWQ - Uses vLLM nightly image + transformers==5.5.0 workaround: Gemma 4 architecture is registered in transformers 5.x but vLLM stable pins <5 - max_model_len=131072 (128K context); KV cache fills ~95% of H100-80GB VRAM - Added gemma4-31b preset watcher.rb: - Add loading_status field to VmSnapshot to show live model-load progress (last relevant log line during startup instead of generic "loading" message) - fetch_vm_stats now captures both Engine 0 stats and loading-phase log lines in a single SSH call using a shell variable to avoid two docker log invocations - clean_log_line() strips vLLM PID/timestamp prefix for readable display cli.rb: update all hardcoded hyperstack-vm1-gptoss.toml references to hyperstack-vm1-coder.toml hypr.fish: replace pi-hyperstack-nemotron with pi-hyperstack-coder (VM1), add pi-hyperstack-gemma4 (VM2) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26	hyperstack: tune nemotron-super preset for single A100-80GB	Paul Buetow
	Model weights occupy ~73.6 GiB leaving ~5.6 GiB for KV cache. Reduce max_model_len to 32768 and raise gpu_memory_utilization to 0.98 to fit. Add --enforce-eager to disable CUDA graph capture, which profiling-phase requires ~2 GiB headroom that simply isn't available on a single A100. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24	gpt-oss-120b: enable reasoning via openai_gptoss parser	Paul Buetow
	- Add --reasoning-parser openai_gptoss to gpt-oss-120b vLLM config in all three toml files; extracts <\|channel\|>analysis thinking blocks into reasoning_content in API responses - Mark gpt-oss-120b as reasoning: true in pi/agent/models.json for all three providers (hyperstack, hyperstack1, hyperstack2) - Update vm1 state file Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-24	hyperstack: gpt-oss-120b + qwen3-coder-next dual-VM pair on A100x1	Paul Buetow
	- Add hyperstack-vm1-gptoss.toml: A100x1 config for gpt-oss-120b (VM1) and qwen3-coder-next (VM2) pair, replacing the H100x2 default - Fix pi/agent/models.json: hyperstack provider URL was pointing at hyperstack.wg1 (unresolvable); corrected to hyperstack1.wg1 (192.168.3.1) - Update hyperstack.rb, hypr.fish: reference vm1-gptoss.toml for create-both and pair commands; update fish abbrs for the new pair setup - Update ask-mode/utils.ts: allow read-only 'ask' commands in ask-mode - Update agent-plan-mode/utils.ts: tighten isAskCommand check - Add state files for provisioned vm1/vm2 instances Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>