diff options
| author | Paul Buetow <paul@buetow.org> | 2026-03-18 18:52:31 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-03-18 18:52:31 +0200 |
| commit | 98858030d4c9c81849dcd49d6212255cbda28755 (patch) | |
| tree | 32ba6ce9f519ca1bca9b499d62407d7489b1a957 /snippets/hyperstack/hyperstack-vm1.toml | |
| parent | 3fe076087ea50ca56f211c4f4c00c8c08b0479da (diff) | |
gpt-oss-120b: raise max_model_len to 131072
MXFP4 KV cache is compact enough that vLLM allocated 168K token blocks
(10560×16) at 0.92 utilization — the 40K limit was too conservative and
caused negative max_tokens errors in long Claude Code sessions.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'snippets/hyperstack/hyperstack-vm1.toml')
0 files changed, 0 insertions, 0 deletions
