summaryrefslogtreecommitdiff
path: root/snippets/hyperstack/hyperstack-vm1.toml
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-03-18 18:52:31 +0200
committerPaul Buetow <paul@buetow.org>2026-03-18 18:52:31 +0200
commit98858030d4c9c81849dcd49d6212255cbda28755 (patch)
tree32ba6ce9f519ca1bca9b499d62407d7489b1a957 /snippets/hyperstack/hyperstack-vm1.toml
parent3fe076087ea50ca56f211c4f4c00c8c08b0479da (diff)
gpt-oss-120b: raise max_model_len to 131072
MXFP4 KV cache is compact enough that vLLM allocated 168K token blocks (10560×16) at 0.92 utilization — the 40K limit was too conservative and caused negative max_tokens errors in long Claude Code sessions. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'snippets/hyperstack/hyperstack-vm1.toml')
0 files changed, 0 insertions, 0 deletions