replace qwen3-coder-next with qwen3.6-27b across configs, docs, and tooling

author: Paul Buetow <paul@buetow.org> 2026-05-24 14:02:34 +0300
committer: Paul Buetow <paul@buetow.org> 2026-05-24 14:02:34 +0300
commit: c8bd4d1e7a34ebf452d3d6c843d5cef785abe608 (patch)
tree: ec1e6c19379c3ba86f6d80d90286eceae393b983
parent: f16f4b753b3bf317e6da79f479ff5f506ed34b47 (diff)
9 files changed, 321 insertions, 109 deletions
diff --git a/AGENTS.md b/AGENTS.md
index ced462c..f7f3491 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -168,7 +168,7 @@ inference is ready. On an A100 with a warm HuggingFace cache:
 **Monitor startup:**
 
 ```bash
-ssh ubuntu@<vm-public-ip> 'sudo docker logs -f vllm_qwen3 2>&1' \
+ssh ubuntu@<vm-public-ip> 'sudo docker logs -f vllm_qwen36_27b 2>&1' \
     | grep -E "startup complete|Error|Loading|Downloading"
 ```
 
@@ -176,7 +176,7 @@ After `Application startup complete.`, the model responds immediately.
 If the container crashes before that line, check for CUDA errors:
 
 ```bash
-ssh ubuntu@<vm-public-ip> 'sudo docker logs vllm_qwen3 2>&1 | grep -i "error\|cuda"'
+ssh ubuntu@<vm-public-ip> 'sudo docker logs vllm_qwen36_27b 2>&1 | grep -i "error\|cuda"'
 ```
 
 A `CUDA error: operation not permitted` on the first engine process (pid visible in
diff --git a/README.md b/README.md
index 0c0df1b..cdb4df4 100644
--- a/README.md
+++ b/README.md
@@ -27,7 +27,7 @@ Runs two A100 VMs concurrently — each serving a different model — with [Pi](
   │  │  │  │ pane 0: pi-coder      │ pane 1: pi-gemma4       │        │  │  │
   │  │  │  │                       │                         │        │  │  │
   │  │  │  │ Pi                    │ Pi                      │        │  │  │
-  │  │  │  │ Qwen3-Coder-Next      │ Gemma 4 31B             │        │  │  │
+  │  │  │  │ Qwen3.6 27B FP8       │ Gemma 4 31B             │        │  │  │
   │  │  │  └──────────┬────────────┘└────────────┬───────────┘        │  │  │
   │  │  │             │ OpenAI API               │ OpenAI API         │  │  │
   │  │  │             │ /v1/chat/completions      │ /v1/chat/completions│ │  │
@@ -45,7 +45,7 @@ Runs two A100 VMs concurrently — each serving a different model — with [Pi](
   │ hyperstack1.wg1          │  │ hyperstack2.wg1          │
   │                          │  │                          │
   │ vLLM :11434              │  │ vLLM :11434              │
-  │ Qwen3-Coder-Next         │  │ Gemma 4 31B IT           │
+  │ Qwen3.6 27B FP8          │  │ Gemma 4 31B IT           │
   │ (MoE, AWQ-4bit)          │  │ (dense, AWQ-4bit)        │
   └──────────────────────────┘  └──────────────────────────┘
 ```
@@ -167,7 +167,7 @@ Source `hyperstack.fish` or copy the abbreviations into your Fish config:
 
 ```fish
 abbr pi-hyperstack         pi --model hyperstack/openai/gpt-oss-120b
-abbr pi-hyperstack-coder   pi --model hyperstack1/bullpoint/Qwen3-Coder-Next-AWQ-4bit
+abbr pi-hyperstack-coder   pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-qwen36  pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-gemma4  pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit
 ```
@@ -176,7 +176,7 @@ Then launch a session after the VM(s) are up:
 
 ```fish
 pi-hyperstack            # GPT-OSS 120B on VM1
-pi-hyperstack-coder      # Qwen3-Coder-Next on VM1
+pi-hyperstack-coder      # Qwen3.6 27B FP8 on VM1
 pi-hyperstack-qwen36     # Qwen3.6 27B FP8 on VM2
 pi-hyperstack-gemma4     # Gemma 4 31B on VM2
 ```
@@ -188,7 +188,7 @@ Three providers are defined, one per setup, each pointing at its vLLM endpoint o
 | Provider | Base URL | Primary model |
 |----------|----------|---------------|
 | `hyperstack` | `http://hyperstack.wg1:11434/v1` | GPT-OSS 120B (single-VM) |
-| `hyperstack1` | `http://hyperstack1.wg1:11434/v1` | Qwen3-Coder-Next (default; presets in TOML) |
+| `hyperstack1` | `http://hyperstack1.wg1:11434/v1` | Qwen3.6 27B FP8 (default; presets in TOML) |
 | `hyperstack2` | `http://hyperstack2.wg1:11434/v1` | Gemma 4 31B (default; presets in TOML) |
 
 All model presets from the TOML configs are registered under each provider, so any
@@ -255,7 +255,7 @@ No API key or account required. Uses DuckDuckGo's free HTML endpoint.
 
 | Config file | Default model | WireGuard IP | Hostname |
 |---|---|---|---|
-| `hyperstack-vm1.toml` | Qwen3-Coder-Next (AWQ-4bit) | `192.168.3.1` | `hyperstack1.wg1` |
+| `hyperstack-vm1.toml` | Qwen3.6 27B FP8 | `192.168.3.1` | `hyperstack1.wg1` |
 | `hyperstack-vm2.toml` | Gemma 4 31B IT (AWQ-4bit) | `192.168.3.3` | `hyperstack2.wg1` |
 
 Each VM has independent state files so they can be managed separately:
@@ -270,8 +270,8 @@ ruby hyperstack.rb --vm 2 status
 Each VM has named model presets in its TOML config. Hot-switch without reprovisioning:
 
 ```bash
-ruby hyperstack.rb --vm 1 model switch qwen3-coder-next
-ruby hyperstack.rb --vm 2 model switch qwen3-coder-next
+ruby hyperstack.rb --vm 1 model switch qwen36-27b
+ruby hyperstack.rb --vm 2 model switch qwen36-27b
 ```
 
 Available presets (both VMs share the same set):
@@ -280,7 +280,7 @@ Available presets (both VMs share the same set):
 |---|---|---|---|
 | `gemma4-31b` | Gemma 4 31B IT (AWQ-4bit) | ~19 GB | 32K–128K (see TOML) |
 | `nemotron-super` | Nemotron-3-Super 120B (Mamba+MoE, 12B active) | ~60 GB | 131K |
-| `qwen3-coder-next` | Qwen3-Coder-Next 80B (MoE, AWQ-4bit) | ~45 GB | 262K |
+| `qwen36-27b` | Qwen3.6 27B FP8 | ~45 GB | 262K |
 | `gpt-oss-120b` | GPT-OSS 120B (MoE, MXFP4) | ~65 GB | 131K |
 | `gpt-oss-20b` | GPT-OSS 20B (MoE, MXFP4) | ~14 GB | 65K |
 | `qwen25-coder-32b` | Qwen2.5-Coder-32B-Instruct (AWQ) | ~18 GB | 32K |
@@ -349,7 +349,7 @@ ruby hyperstack.rb test --vm 1
 ruby hyperstack.rb test --vm 2
 
 # Launch Pi coding agents — one per terminal
-pi-hyperstack-coder      # fish abbreviation → Qwen3-Coder-Next on VM1
+pi-hyperstack-coder      # fish abbreviation → Qwen3.6 27B FP8 on VM1
 pi-hyperstack-qwen36     # fish abbreviation → Qwen3.6 27B FP8 on VM2
 pi-hyperstack-gemma4     # fish abbreviation → Gemma 4 31B on VM2
 
@@ -361,8 +361,8 @@ ruby hyperstack.rb delete --vm both
 
 ```bash
 # Switch the running vLLM container to a different model preset
-ruby hyperstack.rb --vm 1 model switch qwen3-coder-next
-ruby hyperstack.rb --vm 2 model switch qwen3-coder-next
+ruby hyperstack.rb --vm 1 model switch qwen36-27b
+ruby hyperstack.rb --vm 2 model switch qwen36-27b
 ```
 
 See the [VM configuration](#vm-configuration) and [Switching models](#switching-models)
@@ -403,7 +403,7 @@ docker run -d \
   --restart always \
   -v /ephemeral/hug:/root/.cache/huggingface \
   vllm/vllm-openai:latest \
-  --model bullpoint/Qwen3-Coder-Next-AWQ-4bit \
+  --model Qwen/Qwen3.6-27B-FP8 \
   --tensor-parallel-size 1 \
   --enable-auto-tool-choice \
   --tool-call-parser qwen3_coder \
@@ -445,7 +445,7 @@ curl -s http://localhost:11434/v1/models | python3 -m json.tool
 curl -s http://localhost:11434/v1/chat/completions \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer EMPTY" \
-  -d '{"model":"bullpoint/Qwen3-Coder-Next-AWQ-4bit",
+  -d '{"model":"Qwen/Qwen3.6-27B-FP8",
        "messages":[{"role":"user","content":"Hello"}],
        "max_tokens":50}'
 ```
@@ -600,9 +600,9 @@ Search HuggingFace for vLLM-compatible quantized models:
 
 ## Performance characteristics
 
-Measured on A100 80 GB PCIe (single GPU) with Qwen3-Coder-Next AWQ 4-bit:
+Measured on A100 80 GB PCIe (single GPU) with Qwen3.6 27B FP8:
 
-| Metric | vLLM (AWQ 4-bit) | Ollama (Q4_K_M) |
+| Metric | vLLM (FP8) | Ollama (Q4_K_M) |
 |--------|-------------------|-----------------|
 | Prefill throughput | 5,000–11,000 tok/s | ~1,000 tok/s (est.) |
 | Decode throughput | 40–99 tok/s | ~40 tok/s |
diff --git a/hyperstack-vm1.toml b/hyperstack-vm1.toml
index c6fb2df..75c313c 100644
--- a/hyperstack-vm1.toml
+++ b/hyperstack-vm1.toml
@@ -13,13 +13,13 @@ name_prefix = "hyperstack1"
 hostname = "hyperstack1"
 environment_name = "snonux-ollama"
 
-# A100-80GB single GPU for qwen3-coder-next (default); H100 fallback if n3-A100x1 unavailable.
+# A100-80GB single GPU for Qwen3.6 27B (default); H100 fallback if n3-A100x1 unavailable.
 flavor_name = "n3-A100x1"
 image_name = "Ubuntu Server 24.04 LTS R570 CUDA 12.8 with Docker"
 assign_floating_ip = true
 create_bootable_volume = false
 enable_port_randomization = false
-labels = ["qwen3-coder-next", "wireguard"]
+labels = ["qwen36-27b", "wireguard"]
 
 [ssh]
 username = "ubuntu"
@@ -55,16 +55,16 @@ listen_host = "0.0.0.0:11434"
 gpu_overhead_mb = 2000
 num_parallel = 1
 context_length = 32768
-pull_models = ["qwen3-coder-next", "qwen3-coder:30b", "gpt-oss:20b", "gpt-oss:120b", "nemotron-3-super"]
+pull_models = ["qwen36-27b", "qwen3-coder:30b", "gpt-oss:20b", "gpt-oss:120b", "nemotron-3-super"]
 
 # vLLM serves one model via Docker on the OpenAI-compatible API.
-# VM1 defaults to qwen3-coder-next; use 'model switch' to load any other preset.
+# VM1 defaults to Qwen3.6 27B; use 'model switch' to load any other preset.
 [vllm]
 install = true
-model = "bullpoint/Qwen3-Coder-Next-AWQ-4bit"
+model = "Qwen/Qwen3.6-27B-FP8"
 # HuggingFace model cache on ephemeral NVMe (fast; survives reboots on most providers).
 hug_cache_dir = "/ephemeral/hug"
-container_name = "vllm_qwen3"
+container_name = "vllm_qwen36_27b"
 max_model_len = 262144
 gpu_memory_utilization = 0.92
 tensor_parallel_size = 1
@@ -73,13 +73,16 @@ tool_call_parser = "qwen3_coder"
 # Named model presets for 'ruby hyperstack.rb --vm 1 model switch <name>'.
 # Each preset overrides the matching [vllm] field; unset fields fall back to [vllm] defaults.
 
-[vllm.presets.qwen3-coder-next]
-model = "bullpoint/Qwen3-Coder-Next-AWQ-4bit"
-container_name = "vllm_qwen3"
+# Qwen3.6-27B FP8 — dense 27B multimodal model with native 262K context.
+# Uses qwen3 reasoning parsing plus qwen3_coder tool calling on vLLM >=0.19.0.
+[vllm.presets.qwen36-27b]
+model = "Qwen/Qwen3.6-27B-FP8"
+container_name = "vllm_qwen36_27b"
 max_model_len = 262144
 gpu_memory_utilization = 0.92
 tensor_parallel_size = 1
 tool_call_parser = "qwen3_coder"
+extra_vllm_args = ["--reasoning-parser", "qwen3"]
 
 # NVIDIA Nemotron-3-Super-120B-A12B AWQ 4-bit — hybrid Mamba+MoE (12B active / 120B total).
 # Single-GPU (A100-80GB) config: tensor_parallel_size=1, context capped at 32k to fit in VRAM.
diff --git a/hyperstack-vm2.toml b/hyperstack-vm2.toml
index c3605ff..faa8054 100644
--- a/hyperstack-vm2.toml
+++ b/hyperstack-vm2.toml
@@ -55,7 +55,7 @@ listen_host = "0.0.0.0:11434"
 gpu_overhead_mb = 2000
 num_parallel = 1
 context_length = 32768
-pull_models = ["qwen3-coder-next"]
+pull_models = ["qwen36-27b"]
 
 # vLLM serves one model via Docker on the OpenAI-compatible API.
 # VM2 defaults to Qwen3.6 27B; use 'model switch' to load any other preset.
@@ -102,14 +102,6 @@ docker_image = "vllm/vllm-openai:nightly"
 pre_start_cmd = "pip install -q transformers==5.5.0 2>/dev/null"
 extra_docker_env = ["CUDA_VISIBLE_DEVICES=0"]
 
-[vllm.presets.qwen3-coder-next]
-model = "bullpoint/Qwen3-Coder-Next-AWQ-4bit"
-container_name = "vllm_qwen3"
-max_model_len = 262144
-gpu_memory_utilization = 0.92
-tensor_parallel_size = 1
-tool_call_parser = "qwen3_coder"
-
 # NVIDIA Nemotron-3-Super-120B-A12B AWQ 4-bit — hybrid Mamba+MoE (12B active / 120B total).
 # ~60 GB weights on A100 80GB; ~13 GB remaining for KV cache at 0.92 utilisation.
 # Uses NoPE so any context length is valid; capped at 131072 to keep KV cache within VRAM budget.
diff --git a/hypr.fish b/hypr.fish
index e2de7d2..60f8356 100644
--- a/hypr.fish
+++ b/hypr.fish
@@ -1,5 +1,5 @@
 # Dual-VM setup (hyperstack-vm1/vm2.toml -> hyperstack1/2.wg1)
-abbr pi-hyperstack-coder   pi --model hyperstack1/bullpoint/Qwen3-Coder-Next-AWQ-4bit
+abbr pi-hyperstack-coder   pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-qwen36  pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-gemma4  pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit
 abbr hyperstack-create      ruby ~/git/hyperstack/hyperstack.rb create
diff --git a/lib/hyperstack/config.rb b/lib/hyperstack/config.rb
index ba143e7..7057b4f 100644
--- a/lib/hyperstack/config.rb
+++ b/lib/hyperstack/config.rb
@@ -85,9 +85,9 @@ module HyperstackVM
       },
       'vllm' => {
         'install' => true,
-        'model' => 'bullpoint/Qwen3-Coder-Next-AWQ-4bit',
+        'model' => 'Qwen/Qwen3.6-27B-FP8',
         'hug_cache_dir' => '/ephemeral/hug',
-        'container_name' => 'vllm_qwen3',
+        'container_name' => 'vllm_qwen36_27b',
         'max_model_len' => 262_144,
         'gpu_memory_utilization' => 0.92,
         'tensor_parallel_size' => 1,
diff --git a/logo.svg b/logo.svg
index 6bf71b4..a405085 100644
--- a/logo.svg
+++ b/logo.svg
@@ -378,7 +378,7 @@
       <tspan fill="#8b949e"> pi --model hyperstack2/qwen3</tspan>
     </text>
     <text x="1112" y="286" fill="#6e7681">  Connecting to hyperstack2.wg1…</text>
-    <text x="1112" y="302" fill="#58a6ff">  » I am Qwen3-Coder, let's build!</text>
+    <text x="1112" y="302" fill="#58a6ff">  » I am Qwen3.6, let's build!</text>
 
     <!-- Blinking cursor -->
     <rect x="1112" y="322" width="8" height="14" fill="#58a6ff" opacity="0.8"/>
diff --git a/pi/agent/extensions/nemotron-tool-repair/index.ts b/pi/agent/extensions/nemotron-tool-repair/index.ts
index 9bb8f94..ae59a66 100644
--- a/pi/agent/extensions/nemotron-tool-repair/index.ts
+++ b/pi/agent/extensions/nemotron-tool-repair/index.ts
@@ -20,7 +20,7 @@ import type { ExtensionAPI, ExtensionContext } from "@mariozechner/pi-coding-age
 const CUSTOM_API = "hyperstack-openai-completions-repaired";
 const TARGET_PROVIDERS = new Set(["hyperstack1", "hyperstack2"]);
 const NEMOTRON_MODEL_PATTERN = /NVIDIA-Nemotron-3-Super/i;
-// Matches all Qwen Coder variants (Qwen3-Coder-Next, Qwen3-Coder-30B, etc.)
+// Matches Qwen3 Coder variants (Qwen3-Coder-30B, etc.)
 const QWEN_CODER_MODEL_PATTERN = /Qwen.*Coder/i;
 const MODELS_JSON_PATH = path.resolve(
 	path.dirname(fileURLToPath(import.meta.url)),
diff --git a/pi/agent/models.json b/pi/agent/models.json
index 48cd0e9..a5e8200 100644
--- a/pi/agent/models.json
+++ b/pi/agent/models.json
@@ -14,8 +14,15 @@
           "id": "openai/gpt-oss-120b",
           "name": "GPT-OSS 120B [vm]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 131072,
           "maxTokens": 8192
         },
@@ -23,17 +30,31 @@
           "id": "openai/gpt-oss-20b",
           "name": "GPT-OSS 20B [vm]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 65536,
           "maxTokens": 8192
         },
         {
-          "id": "bullpoint/Qwen3-Coder-Next-AWQ-4bit",
-          "name": "Qwen3 Coder Next [vm]",
+          "id": "Qwen/Qwen3.6-27B-FP8",
+          "name": "Qwen3.6 27B FP8 [vm]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 262144,
           "maxTokens": 8192,
           "compat": {
@@ -47,8 +68,15 @@
           "id": "cyankiwi/gemma-4-31B-it-AWQ-4bit",
           "name": "Gemma 4 31B IT [vm]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 131072,
           "maxTokens": 8192
         },
@@ -56,8 +84,15 @@
           "id": "cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit",
           "name": "Nemotron 3 Super 120B [vm]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 262144,
           "maxTokens": 8192
         },
@@ -65,8 +100,15 @@
           "id": "Qwen/Qwen2.5-Coder-32B-Instruct-AWQ",
           "name": "Qwen2.5 Coder 32B [vm]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         },
@@ -74,8 +116,15 @@
           "id": "QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ",
           "name": "Qwen3 Coder 30B [vm]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 65536,
           "maxTokens": 8192,
           "compat": {
@@ -89,8 +138,15 @@
           "id": "casperhansen/deepseek-r1-distill-qwen-32b-awq",
           "name": "DeepSeek-R1-Distill 32B [vm]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         },
@@ -98,8 +154,15 @@
           "id": "Qwen/Qwen3-32B-AWQ",
           "name": "Qwen3 32B [vm]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192,
           "compat": {
@@ -113,8 +176,15 @@
           "id": "cyankiwi/Devstral-Small-2507-AWQ-4bit",
           "name": "Devstral Small 2507 [vm]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         }
@@ -134,8 +204,15 @@
           "id": "cyankiwi/gemma-4-31B-it-AWQ-4bit",
           "name": "Gemma 4 31B IT [vm1]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 131072,
           "maxTokens": 8192
         },
@@ -143,17 +220,31 @@
           "id": "cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit",
           "name": "Nemotron 3 Super 120B 1M [vm1]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 1048576,
           "maxTokens": 8192
         },
         {
-          "id": "bullpoint/Qwen3-Coder-Next-AWQ-4bit",
-          "name": "Qwen3 Coder Next [vm1]",
+          "id": "Qwen/Qwen3.6-27B-FP8",
+          "name": "Qwen3.6 27B FP8 [vm1]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 262144,
           "maxTokens": 8192,
           "compat": {
@@ -167,8 +258,15 @@
           "id": "openai/gpt-oss-20b",
           "name": "GPT-OSS 20B [vm1]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 65536,
           "maxTokens": 8192
         },
@@ -176,8 +274,15 @@
           "id": "openai/gpt-oss-120b",
           "name": "GPT-OSS 120B [vm1]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 131072,
           "maxTokens": 8192
         },
@@ -185,8 +290,15 @@
           "id": "Qwen/Qwen2.5-Coder-32B-Instruct-AWQ",
           "name": "Qwen2.5 Coder 32B [vm1]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         },
@@ -194,8 +306,15 @@
           "id": "QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ",
           "name": "Qwen3 Coder 30B [vm1]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 65536,
           "maxTokens": 8192,
           "compat": {
@@ -209,8 +328,15 @@
           "id": "casperhansen/deepseek-r1-distill-qwen-32b-awq",
           "name": "DeepSeek-R1-Distill 32B [vm1]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         },
@@ -218,8 +344,15 @@
           "id": "Qwen/Qwen3-32B-AWQ",
           "name": "Qwen3 32B [vm1]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192,
           "compat": {
@@ -233,8 +366,15 @@
           "id": "cyankiwi/Devstral-Small-2507-AWQ-4bit",
           "name": "Devstral Small 2507 [vm1]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         }
@@ -254,8 +394,15 @@
           "id": "Qwen/Qwen3.6-27B-FP8",
           "name": "Qwen3.6 27B FP8 [vm2]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 262144,
           "maxTokens": 8192,
           "compat": {
@@ -269,17 +416,31 @@
           "id": "cyankiwi/gemma-4-31B-it-AWQ-4bit",
           "name": "Gemma 4 31B IT [vm2]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 131072,
           "maxTokens": 8192
         },
         {
-          "id": "bullpoint/Qwen3-Coder-Next-AWQ-4bit",
-          "name": "Qwen3 Coder Next [vm2]",
+          "id": "Qwen/Qwen3.6-27B-FP8",
+          "name": "Qwen3.6 27B FP8 [vm2]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 262144,
           "maxTokens": 8192,
           "compat": {
@@ -293,8 +454,15 @@
           "id": "cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit",
           "name": "Nemotron 3 Super 120B [vm2]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 262144,
           "maxTokens": 8192
         },
@@ -302,8 +470,15 @@
           "id": "openai/gpt-oss-20b",
           "name": "GPT-OSS 20B [vm2]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 65536,
           "maxTokens": 8192
         },
@@ -311,8 +486,15 @@
           "id": "openai/gpt-oss-120b",
           "name": "GPT-OSS 120B [vm2]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 131072,
           "maxTokens": 8192
         },
@@ -320,8 +502,15 @@
           "id": "Qwen/Qwen2.5-Coder-32B-Instruct-AWQ",
           "name": "Qwen2.5 Coder 32B [vm2]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         },
@@ -329,8 +518,15 @@
           "id": "QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ",
           "name": "Qwen3 Coder 30B [vm2]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 65536,
           "maxTokens": 8192,
           "compat": {
@@ -344,8 +540,15 @@
           "id": "casperhansen/deepseek-r1-distill-qwen-32b-awq",
           "name": "DeepSeek-R1-Distill 32B [vm2]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         },
@@ -353,8 +556,15 @@
           "id": "Qwen/Qwen3-32B-AWQ",
           "name": "Qwen3 32B [vm2]",
           "reasoning": true,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192,
           "compat": {
@@ -368,8 +578,15 @@
           "id": "cyankiwi/Devstral-Small-2507-AWQ-4bit",
           "name": "Devstral Small 2507 [vm2]",
           "reasoning": false,
-          "input": ["text"],
-          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
+          "input": [
+            "text"
+          ],
+          "cost": {
+            "input": 0,
+            "output": 0,
+            "cacheRead": 0,
+            "cacheWrite": 0
+          },
           "contextWindow": 32768,
           "maxTokens": 8192
         }
author	Paul Buetow <paul@buetow.org>	2026-05-24 14:02:34 +0300
committer	Paul Buetow <paul@buetow.org>	2026-05-24 14:02:34 +0300
commit	c8bd4d1e7a34ebf452d3d6c843d5cef785abe608 (patch)
tree	ec1e6c19379c3ba86f6d80d90286eceae393b983
parent	f16f4b753b3bf317e6da79f479ff5f506ed34b47 (diff)