|
VM1 (hyperstack-vm1-coder.toml, renamed from hyperstack-vm1-gptoss.toml):
- Default model switched from gpt-oss-120b to qwen3-coder-next
- Config file renamed to reflect actual default model
VM2 (hyperstack-vm2.toml):
- Default model switched from qwen3-coder-next to Gemma 4 31B AWQ
- Uses vLLM nightly image + transformers==5.5.0 workaround: Gemma 4
architecture is registered in transformers 5.x but vLLM stable pins <5
- max_model_len=131072 (128K context); KV cache fills ~95% of H100-80GB VRAM
- Added gemma4-31b preset
watcher.rb:
- Add loading_status field to VmSnapshot to show live model-load progress
(last relevant log line during startup instead of generic "loading" message)
- fetch_vm_stats now captures both Engine 0 stats and loading-phase log lines
in a single SSH call using a shell variable to avoid two docker log invocations
- clean_log_line() strips vLLM PID/timestamp prefix for readable display
cli.rb: update all hardcoded hyperstack-vm1-gptoss.toml references to
hyperstack-vm1-coder.toml
hypr.fish: replace pi-hyperstack-nemotron with pi-hyperstack-coder (VM1),
add pi-hyperstack-gemma4 (VM2)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|