conf - Configuration files for the automation of my personal infrastructure (servers, laptops, workstations, phones)!

diff options

author	Paul Buetow <paul@buetow.org>	2026-03-18 18:52:31 +0200
committer	Paul Buetow <paul@buetow.org>	2026-03-18 18:52:31 +0200
commit	98858030d4c9c81849dcd49d6212255cbda28755 (patch)
tree	32ba6ce9f519ca1bca9b499d62407d7489b1a957 /snippets/hyperstack/hyperstack-vm1.toml
parent	3fe076087ea50ca56f211c4f4c00c8c08b0479da (diff)

gpt-oss-120b: raise max_model_len to 131072

MXFP4 KV cache is compact enough that vLLM allocated 168K token blocks (10560×16) at 0.92 utilization — the 40K limit was too conservative and caused negative max_tokens errors in long Claude Code sessions. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Diffstat (limited to 'snippets/hyperstack/hyperstack-vm1.toml')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: