diff options
| author | Paul Buetow <paul@buetow.org> | 2026-03-26 09:15:57 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-03-26 09:15:57 +0200 |
| commit | acaea730cfca43bf96f4e70b3104559db7977a3f (patch) | |
| tree | c9f05056ab8de446715d6c7ce284e00217d44850 /lib/hyperstack | |
| parent | 5281c4fe10b7600da0aa5170963f1f44119058aa (diff) | |
hyperstack: tune nemotron-super preset for single A100-80GB
Model weights occupy ~73.6 GiB leaving ~5.6 GiB for KV cache. Reduce
max_model_len to 32768 and raise gpu_memory_utilization to 0.98 to fit.
Add --enforce-eager to disable CUDA graph capture, which profiling-phase
requires ~2 GiB headroom that simply isn't available on a single A100.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Diffstat (limited to 'lib/hyperstack')
0 files changed, 0 insertions, 0 deletions
