Fix nemotron-super tool_call_parser; auto-clear WireGuard hostname from known_hosts

- hyperstack-vm.toml: set tool_call_parser=llama3_json for nemotron-super so vLLM accepts tool_choice requests from opencode; model won't spontaneously call tools so the vLLM 0.17.1 token_ids crash in llama3_json won't trigger - hyperstack.rb: wait_for_ssh now also removes the WireGuard hostname (hyperstack.wg1) from known_hosts alongside the IP, preventing StrictHostKeyChecking failures across VM recreates Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
author: Paul Buetow <paul@buetow.org> 2026-03-18 17:28:56 +0200
committer: Paul Buetow <paul@buetow.org> 2026-03-18 17:28:56 +0200
commit: bda86a3c91b307e25507e975927c3dde38f65a74 (patch)
tree: 892e7d3f03a05b34528c08bd1091172c4e6de643 /snippets/hyperstack/hyperstack-vm.toml
parent: d3821c76ecd18bf6256d7493596c304fff784d29 (diff)
1 files changed, 3 insertions, 1 deletions
diff --git a/snippets/hyperstack/hyperstack-vm.toml b/snippets/hyperstack/hyperstack-vm.toml
index f1c80a7..4ec6879 100644
--- a/snippets/hyperstack/hyperstack-vm.toml
+++ b/snippets/hyperstack/hyperstack-vm.toml
@@ -100,7 +100,9 @@ container_name = "vllm_nemotron_super"
 max_model_len = 65536
 gpu_memory_utilization = 0.92
 tensor_parallel_size = 1
-tool_call_parser = ""
+# llama3_json lets vLLM accept tool_choice requests (required by opencode).
+# Nemotron won't spontaneously call tools, so the vLLM 0.17.1 token_ids bug won't trigger.
+tool_call_parser = "llama3_json"
 trust_remote_code = true
 
 # OpenAI GPT-OSS 20B — ultra-fast MoE (3.6B active / 20B total, MXFP4), ~14 GB on A100.
author	Paul Buetow <paul@buetow.org>	2026-03-18 17:28:56 +0200
committer	Paul Buetow <paul@buetow.org>	2026-03-18 17:28:56 +0200
commit	bda86a3c91b307e25507e975927c3dde38f65a74 (patch)
tree	892e7d3f03a05b34528c08bd1091172c4e6de643 /snippets/hyperstack/hyperstack-vm.toml
parent	d3821c76ecd18bf6256d7493596c304fff784d29 (diff)