summaryrefslogtreecommitdiff
path: root/hyperstack.rb
AgeCommit message (Collapse)Author
2026-03-22Upgrade VM1 to H100x2 with 1M context for Nemotron-3-SuperPaul Buetow
Switch VM1 from n3-H100x1 to n3-H100x2 to run Nemotron-3-Super with 1M token context window via tensor parallelism. The dual-GPU setup (160 GB total VRAM) provides enough KV cache headroom to override the model's config.json limit of 262144 tokens. Key changes: - flavor_name: n3-H100x1 → n3-H100x2 - tensor_parallel_size: 1 → 2 - max_model_len: 131072 → 1048576 (with VLLM_ALLOW_LONG_MAX_MODEL_LEN=1) - gpu_memory_utilization: 0.92 → 0.85 (headroom for Mamba cache + sampler warmup) - Remove --enforce-eager: no longer needed with dual-GPU VRAM budget - Disable prefix caching: on NemotronH it forces Mamba "all" cache mode which pre-allocates states for all max_num_seqs and OOMs before the sampler warmup pass; per-request allocation is cheaper at startup Add two new vllm config fields to hyperstack.rb: - extra_docker_env: passes -e KEY=VALUE flags to Docker before the image name (used for VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 and PYTORCH_ALLOC_CONF=expandable_segments:True) - enable_prefix_caching: makes --enable-prefix-caching conditional (default true for backward compat; false for NemotronH) Both fields are supported in [vllm] defaults and [vllm.presets.*] overrides with the same fallback semantics as existing fields. Update pi/agent/models.json: Nemotron vm1 entry renamed to "Nemotron 3 Super 120B 1M [vm1]" with contextWindow 1048576. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-21Consolidate vllm-setup.txt into README.md and remove the filePaul Buetow
Merged all still-relevant content from vllm-setup.txt into README.md: - Why vLLM over Ollama section - Full monitoring commands with engine metrics table - Troubleshooting table - VRAM sizing guide - Performance characteristics table Dropped LiteLLM, Anthropic API, Claude Code, and OpenCode sections which are no longer applicable. Removes the vllm-setup.txt file.
2026-03-21Remove LiteLLM and Claude Code repo references (task 301)Paul Buetow
2026-03-21initial importPaul Buetow