12 files changed, 185 insertions, 268 deletions
diff --git a/PLAN-L40.md b/PLAN-L40.md
deleted file mode 100644
index 3d0b1ff..0000000
--- a/PLAN-L40.md
+++ /dev/null
@@ -1,157 +0,0 @@
-# Plan: VM1 on Hyperstack L40 with Qwen3.6 MoE + TurboQuant
-
-**Prepared:** 2026-05-24  
-**Scope:** Research and planning only — no code changes, no provisioning.
-
----
-
-## 1. GPU and VM sizing (Hyperstack L40)
-
-| Item | Assessment |
-|---|---|
-| **Flavor** | Hyperstack’s GPU flavors use the `n3-*` prefix (see current `n3-A100x1` / `n3-H100x1`). The L40 48 GB flavor is expected to be named `n3-L40x1` or `n3-L40Sx1`; exact string must be verified via the Hyperstack console/API before updating `hyperstack-vm1.toml`. |
-| **VRAM** | 48 GB (vs 80 GB on the current A100). That is a hard ceiling for both model weights and KV cache. |
-| **Cost** | L40/L40S nodes are generally cheaper than A100/H100 on Hyperstack. Assuming the tiered pricing model, an L40 should reduce the hourly cost of VM1, but the final price depends on the exact `flavor_name` and any egress charges. |
-
-## 2. Model choice: what actually fits on 48 GB
-
-The prompt mentions **Qwen3.6 MoE (e.g. 235B-A22B)**. A 235B-parameter model in BF16 would require **> 400 GB** of VRAM, which is impossible on a single L40. The only Qwen3.6 MoE that is publicly released and could *potentially* fit is **Qwen3.6-35B-A3B** (35B total / 3B active), but even that is **~70 GB in BF16**.
-
-**Realistic options to make it fit in 48 GB:**
-
-| Option | Weight size (est.) | Fit on 48 GB? | Notes |
-|---|---|---|---|
-| **AWQ 4-bit** Qwen3.6-35B-A3B | ~18 GB | Yes | Needs a community or official AWQ checkpoint (not yet listed as official at the time of writing, but AWQ/GPTQ variants usually appear quickly). |
-| **FP8** Qwen3.6-35B-A3B (if available) | ~35 GB | Tight | Leaves ~10 GB for KV cache, activations and CUDA graphs. vLLM profiling may tip it over. |
-| **Qwen3.6-27B dense** (current VM2 default) | ~27 GB FP8 | Yes | Not MoE; defeats the purpose of the task. |
-
-**Recommendation:** Target an **AWQ 4-bit (or GPTQ 4-bit) Qwen3.6-35B-A3B** checkpoint, or wait for an official **FP8** checkpoint and accept a reduced `max_model_len`. Do not attempt the 235B-A22B variant on a single L40.
-
-## 3. vLLM + TurboQuant compatibility
-
-TurboQuant is a KV-cache compression backend in vLLM. Key upstream state:
-
-- **PR #39931** (merged 2026-05-05) added TurboQuant support for *hybrid* architectures (attention + Mamba/MoE).
-- **Issue #41726** reports a fatal crash during **chunked continuation prefill** on hybrid MoE models (e.g. Qwen3.5-9B NVFP4). Root cause: TurboQuant’s `_continuation_prefill` path requests workspace memory that was not reserved during warmup.
-- **PR #40798** is open as a candidate fix but **not yet merged**.
-
-**Implications for Qwen3.6-35B-A3B:**
-- Because Qwen3.6 uses a hybrid attention+Mamba architecture, it is in the exact class of models affected by #41726.
-- If TurboQuant is enabled (`--kv-cache-dtype turboquant_k8v4`, `--kv-cache-dtype turboquant_4bit_nc`, etc.), any long prompt that crosses a chunked-prefill boundary will likely trigger:
-  ```
-  AssertionError: Workspace is locked but allocation ... requires X MB, current size is Y MB.
-  ```
-
-**Mitigations available today:**
-1. **Disable chunked prefill:** Pass `--no-enable-chunked-prefill` in `extra_vllm_args`. This avoids the `_continuation_prefill` path entirely. Trade-off: large prefills are no longer split into chunks, which can increase latency for long inputs and may OOM if a single prefill is very large.
-2. **Use `--enforce-eager`:** Disables CUDA graph capture, which slightly changes memory layout but does **not** solve the workspace lock issue by itself. It is useful mainly to save a few GB of VRAM on tight GPUs.
-3. **Wait for PR #40798** to merge and land in a stable vLLM image.
-
-## 4. Recommended `hyperstack-vm1.toml` changes (conceptual)
-
-```toml
-[vm]
-# Verify exact flavor string with Hyperstack API before deploying.
-flavor_name = "n3-L40x1"          # or n3-L40Sx1
-labels = ["qwen36-moe", "wireguard"]
-
-[vllm]
-install = true
-model = "Qwen/Qwen3.6-35B-A3B-AWQ"   # or the best available quantized MoE
-container_name = "vllm_qwen36_moe"
-max_model_len = 65536                  # conservative for 48 GB; can raise if AWQ
-gpu_memory_utilization = 0.92
-tensor_parallel_size = 1
-tool_call_parser = "qwen3_coder"
-
-# TurboQuant KV cache on a hybrid MoE
-extra_vllm_args = [
-  "--reasoning-parser", "qwen3",
-  "--kv-cache-dtype", "turboquant_k8v4",
-  "--no-enable-chunked-prefill"        # mitigation for issue #41726
-]
-
-# Nightly image post-PR-39931 is required; pin to a known-good digest until 0.20.2+
-docker_image = "vllm/vllm-openai:nightly"
-```
-
-**VRAM estimate (AWQ 4-bit + TurboQuant K8V4 on L40 48 GB):**
-
-| Consumer | Est. size |
-|---|---|
-| AWQ weights (35B params @ 4-bit) | ~18 GB |
-| Activations / MoE routing / logits | ~4–6 GB |
-| CUDA graphs (if not eager) | ~2 GB |
-| KV cache (TurboQuant) | ~20–24 GB |
-| **Headroom** | **~0–4 GB** |
-
-Because headroom is thin, `gpu_memory_utilization=0.92` is appropriate. If profiling OOMs, raise it to `0.95` or drop `max_model_len`. If vLLM still OOMs during startup, try `--enforce-eager` to reclaim the CUDA-graph memory.
-
-## 5. CLI and WireGuard implications
-
-| Area | Impact |
-|---|---|
-| `--vm 1 / 2 / both` | No structural changes. The CLI already resolves `hyperstack-vm1.toml` independently via its own state file. Switching the flavor/model is transparent to `--vm 2`. |
-| WireGuard | `wireguard_server_ip = "192.168.3.1"` stays the same. Recreating VM1 yields a new public IP, so the local `wg1.conf` peer endpoint must be refreshed (`ruby hyperstack.rb --vm 1 create` already handles this via `wg1-setup.sh`). The tunnel subnet `192.168.3.0/24` is unchanged. |
-| Port 11434 / firewall | Unchanged. Port 56710 UDP and 22 TCP remain locked to `allowed_wireguard_cidrs` / `allowed_ssh_cidrs`. |
-| Dual-VM routing | The client can continue to round-robin or fallback between `192.168.3.1` (VM1, MoE) and `192.168.3.3` (VM2, dense). No code changes needed. |
-
-## 6. Risks
-
-| Risk | Severity | Mitigation |
-|---|---|---|
-| **TurboQuant crash (#41726)** on hybrid MoE | High | Disable chunked prefill now; migrate to fixed vLLM nightly once PR #40798 lands. |
-| **Model does not fit** in 48 GB if no AWQ/FP8 checkpoint exists | High | Confirm a 4-bit or FP8 checkpoint is on HuggingFace before provisioning. Fallback to Qwen3.6-27B dense (moves goalposts). |
-| **Performance regression** from no chunked prefill | Medium | Expect higher TTFB on long prompts. Monitor with `ruby hyperstack.rb --vm 1 test`. |
-| **Flavor unavailability** | Medium | Have a fallback flavor ready (e.g. `n3-A100x1` on VM1 if L40 is sold out), or accept A100 pricing. |
-| **Nightly Docker image instability** | Medium | Pin to a specific digest (`vllm/vllm-openai@sha256:...`) after first successful smoke test. |
-
-## 7. Step-by-step migration plan (if you decide to proceed)
-
-1. **Verify asset availability**
-   - Confirm Hyperstack offers an L40 flavor and note its exact name.
-   - Locate a Qwen3.6-35B-A3B AWQ/FP8 checkpoint on HuggingFace. If none exists, abort or pivot to the dense 27B.
-
-2. **Snapshot / backup**
-   - Ensure VM2 (A100 dense) is stable and passing tests (`ruby hyperstack.rb --vm 2 test`).
-   - Save current VM1 state file as `.hyperstack-vm1-state.json.bak` in case a fast rollback is needed.
-
-3. **Update configuration**
-   - Edit `hyperstack-vm1.toml`:
-     - `flavor_name` → L40 flavor.
-     - `[vllm]` block → new model ID, container name, conservative `max_model_len`.
-     - Add `docker_image = "vllm/vllm-openai:nightly"` (or a pinned digest).
-     - Add TurboQuant arg and chunked-prefill mitigation to `extra_vllm_args`.
-   - Update `[vm] labels` to reflect the new model.
-
-4. **Provision**
-   ```bash
-   ruby hyperstack.rb --vm 1 create --replace
-   ```
-   The `--replace` flag tears down the old A100 VM1 and rebuilds it on L40.
-
-5. **Post-create validation**
-   - Check WireGuard handshake: `sudo wg show wg1 latest-handshakes`.
-   - Ping tunnel IP: `ping -c 3 192.168.3.1`.
-   - Query vLLM: `curl -s http://192.168.3.1:11434/v1/models`.
-   - Run the automated test suite: `ruby hyperstack.rb --vm 1 test`.
-
-6. **Smoke test for TurboQuant stability**
-   - Send a conversation with a very long system prompt (> 4096 tokens) and tool schemas to force a chunked-prefill boundary.
-   - If the engine crashes with the workspace assertion, apply the fallback:
-     - Add `--enforce-eager` to `extra_vllm_args`, or
-     - Fall back to `--kv-cache-dtype fp8` (loses TurboQuant compression but is stable).
-
-7. **Dual-VM confirmation**
-   - Run `ruby hyperstack.rb --vm both test` to ensure both endpoints are healthy and reachable through the WireGuard tunnel.
-
-8. **Monitor and iterate**
-   - Watch VRAM usage with `nvidia-smi` inside the VM.
-   - Adjust `max_model_len` and `gpu_memory_utilization` as needed.
-   - Once upstream PR #40798 merges, rebuild the Docker image with the fixed vLLM version and re-enable chunked prefill.
-
----
-
-## Bottom line
-
-The L40 is a cost-efficient target *if* a quantized Qwen3.6-35B-A3B checkpoint is available. The biggest blocker is the open vLLM issue #41726 (TurboQuant + hybrid MoE crash on chunked prefill). Disabling chunked prefill is a viable short-term workaround, but it comes with a latency trade-off and must be validated before making VM1 the default endpoint.
diff --git a/README.md b/README.md
index a27ddbd..ecd5714 100644
--- a/README.md
+++ b/README.md
@@ -154,7 +154,7 @@ Each Hyperstack VM runs a vLLM instance; Pi connects to it directly over the Wir
 Install Pi from [pi.dev](https://pi.dev), then link the project-local config into place:
 
 ```bash
-ln -s /path/to/hyperstack/pi ~/.pi
+ln -s /path/to/hypr/pi ~/.pi
 ```
 
 This symlink makes Pi pick up `pi/agent/models.json` and `pi/agent/settings.json`
@@ -163,11 +163,11 @@ definitions are available without any manual config editing.
 
 ### Fish shell abbreviations
 
-Source `hyperstack.fish` or copy the abbreviations into your Fish config:
+Source `hypr.fish` or copy the abbreviations into your Fish config:
 
 ```fish
 abbr pi-hyperstack         pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
-abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
+abbr pi-hyperstack-coder   pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-qwen36  pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-gemma4  pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit
 ```
@@ -176,7 +176,7 @@ Then launch a session after the VM(s) are up:
 
 ```fish
 pi-hyperstack            # Qwen3.6 27B FP8 on VM1
-pi-hyperstack-coder   # Qwen3.6 27B FP8 on VM1
+pi-hyperstack-coder      # Qwen3.6 27B FP8 on VM1
 pi-hyperstack-qwen36     # Qwen3.6 27B FP8 on VM2
 pi-hyperstack-gemma4     # Gemma 4 31B on VM2
 ```
@@ -280,10 +280,8 @@ Available presets (both VMs share the same set):
 |---|---|---|---|
 | `gemma4-31b` | Gemma 4 31B IT (AWQ-4bit) | ~19 GB | 32K–128K (see TOML) |
 | `nemotron-super` | Nemotron-3-Super 120B (Mamba+MoE, 12B active) | ~60 GB | 131K |
-| `qwen36-35b-a3b` | Qwen3.6-35B-A3B MoE (AWQ, 3B active) | ~18 GB | 65K* |
+| `qwen36-35b-a3b` | Qwen3.6-35B-A3B MoE (AWQ, 3B active) | ~18 GB | 65K* (needs a quantized checkpoint) |
 | `qwen36-27b` | Qwen3.6 27B FP8 | ~45 GB | 262K |
-
-\* Needs a quantized checkpoint on HuggingFace before it can run on a single GPU.
 | `qwen25-coder-32b` | Qwen2.5-Coder-32B-Instruct (AWQ) | ~18 GB | 32K |
 | `qwen3-coder-30b` | Qwen3-Coder-30B-A3B (MoE, AWQ) | ~18 GB | 65K |
 | `deepseek-r1-32b` | DeepSeek-R1-Distill-Qwen-32B (AWQ) | ~18 GB | 32K |
@@ -317,7 +315,6 @@ All commands accept --vm 1|2|both (default: 1).
 ## Configuration
 
 Edit `hyperstack-vm1.toml` / `hyperstack-vm2.toml`.
-Use `hyperstack-vm1-nemotron.toml` for a dual-H100 Nemotron-3-Super profile on the VM1 slot (same state file as `hyperstack-vm1.toml` — use one or the other).
 Key sections:
 
 | Section | Purpose |
diff --git a/hypr.fish b/hypr.fish
index d75dccb..78e1f7a 100644
--- a/hypr.fish
+++ b/hypr.fish
@@ -3,7 +3,7 @@ abbr pi-hyperstack pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-qwen36 pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8
 abbr pi-hyperstack-gemma4 pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit
-abbr hyperstack-create ruby ~/git/hyperstack/hyperstack.rb create
+abbr hyperstack-create ruby ~/git/hypr/hyperstack.rb create
 
 # Ollama (local endpoint pointing at cloud models)
 abbr pi-ollama-kimi pi --provider ollama --model kimi-k2.6:cloud
diff --git a/lib/hyperstack/cli.rb b/lib/hyperstack/cli.rb
index 76f158e..b5bcaff 100644
--- a/lib/hyperstack/cli.rb
+++ b/lib/hyperstack/cli.rb
@@ -1,5 +1,6 @@
 # frozen_string_literal: true
 
+require 'json'
 require 'optparse'
 require 'socket'
 
@@ -383,9 +384,9 @@ module HyperstackVM
         hostnames = loaders.map { |loader| loader.config.wireguard_gateway_hostname }
         begin
           local_manager = build_manager(loaders.first.config, out: local_wg_out)
-          cleanup = local_manager.send(:cleanup_local_access, dry_run: dry_run, hostnames: hostnames,
+          cleanup = local_manager.cleanup_local_access(dry_run: dry_run, hostnames: hostnames,
                                                               allowed_ips: allowed_ips)
-          local_manager.send(:report_local_cleanup, local_wg_out, cleanup, dry_run: dry_run)
+          local_manager.report_local_cleanup(local_wg_out, cleanup, dry_run: dry_run)
         rescue Error => e
           errors[:local_wireguard] = e.message
         end
diff --git a/lib/hyperstack/manager.rb b/lib/hyperstack/manager.rb
index 2150554..cecf11d 100644
--- a/lib/hyperstack/manager.rb
+++ b/lib/hyperstack/manager.rb
@@ -1,5 +1,6 @@
 # frozen_string_literal: true
 
+require_relative 'provisioning'
 require_relative 'ssh_runner'
 require_relative 'vm_lifecycle'
 require_relative 'wireguard_setup'
@@ -69,26 +70,19 @@ module HyperstackVM
 
     def create(replace: false, dry_run: false, install_vllm: nil, install_ollama: nil,
                flavor_name: nil, vllm_preset: nil)
-      raise Error, "DRY RUN is not supported." if dry_run
-
-      if replace
-        existing = @state_store.load
-        if existing && existing['vm_id']
-          @vm_lifecycle.delete(vm_id: existing['vm_id'])
-        end
-      end
-
       install_vllm   = @config.vllm_install_enabled?   if install_vllm.nil?
       install_ollama = @config.ollama_install_enabled? if install_ollama.nil?
 
       state = @vm_lifecycle.create(
+        replace: replace,
+        dry_run: dry_run,
         flavor_name: flavor_name,
         vllm_preset: vllm_preset,
         install_vllm: install_vllm,
         install_ollama: install_ollama
-      ) do |s|
-        @local_wireguard.show_local_wireguard(s['public_ip'])
-      end
+      ) { |s| show_local_wireguard([s['public_ip']].compact) }
+
+      return if state.nil?
 
       @orchestrator.run(
         state,
@@ -112,7 +106,7 @@ module HyperstackVM
 
     def status(include_local_wireguard: true)
       ip = @vm_lifecycle.status
-      @local_wireguard.show_local_wireguard(ip) if include_local_wireguard
+      show_local_wireguard([ip].compact) if include_local_wireguard
       ip
     end
 
@@ -132,5 +126,35 @@ module HyperstackVM
     def list_models
       @vm_lifecycle.list_models
     end
+
+    def cleanup_local_access(dry_run:, hostnames:, allowed_ips:)
+      peers = @local_wireguard.remove_peers_by_allowed_ips(allowed_ips, dry_run: dry_run)
+      removed_hosts = @local_wireguard.remove_hostnames(hostnames, dry_run: dry_run)
+      { peers: peers, hostnames: removed_hosts }
+    end
+
+    def report_local_cleanup(output, cleanup, dry_run:)
+      peer_summary = cleanup[:peers].map { |peer| peer['AllowedIPs'] || peer['Endpoint'] }.join(', ')
+      host_summary = cleanup[:hostnames].join(', ')
+
+      if dry_run
+        if cleanup[:peers].empty? && cleanup[:hostnames].empty?
+          output.puts('DRY RUN: no matching local WireGuard peers or host entries would be removed.')
+          return
+        end
+        unless cleanup[:peers].empty?
+          output.puts("DRY RUN: local WireGuard peers would be removed for #{peer_summary}.")
+        end
+        unless cleanup[:hostnames].empty?
+          output.puts("DRY RUN: local host entries would be removed for #{host_summary}.")
+        end
+        return
+      end
+
+      output.puts('No matching local WireGuard peers needed removal.') if cleanup[:peers].empty?
+      output.puts('No matching local host entries needed removal.') if cleanup[:hostnames].empty?
+      output.puts("Local WireGuard peers removed for #{peer_summary}.") unless cleanup[:peers].empty?
+      output.puts("Local host entries removed for #{host_summary}.") unless cleanup[:hostnames].empty?
+    end
   end
 end
diff --git a/lib/hyperstack/provisioning_orchestrator.rb b/lib/hyperstack/provisioning_orchestrator.rb
index f3222d9..8abfec8 100644
--- a/lib/hyperstack/provisioning_orchestrator.rb
+++ b/lib/hyperstack/provisioning_orchestrator.rb
@@ -75,7 +75,6 @@ module HyperstackVM
       @state_store.save(state)
 
       info "VM ready: #{state['public_ip']} (id=#{state['vm_id']})"
-      @inference_tester.config.show_local_wireguard(state['public_ip']) rescue nil
       @inference_tester.test(state)
       state
     end
@@ -106,6 +105,18 @@ module HyperstackVM
         info "Adding Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
         @client.create_vm_rule(vm['id'], rule)
       end
+
+      legacy_litellm_rules(existing).each do |rule|
+        rule_id = rule['id'] || rule['rule_id']
+        unless rule_id
+          warn_out 'Found legacy Hyperstack firewall rule for port 4000, but the API payload has no rule id; remove it manually from the Hyperstack console.'
+          next
+        end
+        info "Removing legacy Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
+        @client.delete_vm_rule(vm['id'], rule_id)
+      rescue Error => e
+        warn_out "Failed to remove legacy Hyperstack firewall rule #{rule_id}: #{e.message}"
+      end
     end
 
     def effective_ollama?
@@ -165,6 +176,16 @@ module HyperstackVM
       %w[ACTIVE SHUTOFF HIBERNATED].include?(vm['status'].to_s.upcase)
     end
 
+    def legacy_litellm_rules(rules)
+      Array(rules).select do |rule|
+        normalized = normalize_rule(rule)
+        normalized['protocol'] == 'tcp' &&
+          normalized['port_range_min'] == 4000 &&
+          normalized['port_range_max'] == 4000 &&
+          normalized['remote_ip_prefix'] == @config.wireguard_subnet
+      end
+    end
+
     private
 
     def with_polling(description, timeout: 900, interval: 5)
@@ -183,5 +204,9 @@ module HyperstackVM
     def info(msg)
       @out.puts(msg)
     end
+
+    def warn_out(msg)
+      @out.puts("WARN: #{msg}")
+    end
   end
 end
diff --git a/lib/hyperstack/ssh_runner.rb b/lib/hyperstack/ssh_runner.rb
index f41859d..e4440b6 100644
--- a/lib/hyperstack/ssh_runner.rb
+++ b/lib/hyperstack/ssh_runner.rb
@@ -1,5 +1,6 @@
 # frozen_string_literal: true
 
+require 'fileutils'
 require 'open3'
 require 'socket'
 
diff --git a/lib/hyperstack/vm_lifecycle.rb b/lib/hyperstack/vm_lifecycle.rb
index 972c896..cc52880 100644
--- a/lib/hyperstack/vm_lifecycle.rb
+++ b/lib/hyperstack/vm_lifecycle.rb
@@ -1,5 +1,8 @@
 # frozen_string_literal: true
 
+require 'json'
+require_relative 'provisioning'
+
 module HyperstackVM
   # Orchestrates the VM lifecycle from creation through deletion.
   class VmLifecycle
@@ -9,27 +12,52 @@ module HyperstackVM
       @state_store = state_store
       @local_wireguard = local_wireguard
       @out = out
+      @scripts = ProvisioningScripts.new(config: config)
     end
 
     attr_reader :config, :client, :state_store
 
-    def create(flavor_name: nil, vllm_preset: nil, install_vllm: nil, install_ollama: nil, &block)
+    def create(replace: false, dry_run: false, flavor_name: nil, vllm_preset: nil,
+               install_vllm: nil, install_ollama: nil, &block)
       @effective_flavor_name = flavor_name.nil? ? @config.flavor_name : flavor_name
       @state_store.load if defined?(@state_store) # force load
       existing = @state_store.load
       if existing && existing['vm_id']
-        raise Error,
-              "State file #{@state_store.path} already tracks VM #{existing['vm_id']}. Use --replace or delete first."
+        if replace
+          if dry_run
+            info "DRY RUN: would delete tracked VM #{existing['vm_id']} before creating a replacement."
+            show_local_wireguard([])
+            return nil
+          else
+            delete(vm_id: existing['vm_id'])
+          end
+        elsif resumable_state?(existing)
+          if dry_run
+            print_resume_dry_run(existing, install_vllm: install_vllm, install_ollama: install_ollama, vllm_preset: vllm_preset)
+            return nil
+          end
+          info "Resuming tracked VM #{existing['vm_id']} provisioning..."
+          return existing
+        else
+          raise Error,
+                "State file #{@state_store.path} already tracks VM #{existing['vm_id']}. Use --replace or delete first."
+        end
       end
 
       resolved = resolve_dependencies
       vm_name  = @config.generated_vm_name
-      info "Creating VM #{vm_name} in #{resolved[:environment]['name']} using #{@effective_flavor_name}..."
+      info (dry_run ? "Planning" : "Creating") + " VM #{vm_name} in #{resolved[:environment]['name']} using #{@effective_flavor_name}..."
 
       payload = build_payload(vm_name, resolved, install_vllm: install_vllm, install_ollama: install_ollama)
+      if dry_run
+        print_create_dry_run(vm_name, resolved, payload, install_vllm: install_vllm, install_ollama: install_ollama, vllm_preset: vllm_preset)
+        show_local_wireguard([])
+        return nil
+      end
+
       response = @client.create_vm(payload)
       instance = Array(response['instances']).first
-      raise Error, 'Hyperstack create response did not include an instance ID.' unless instance&&['id']
+      raise Error, 'Hyperstack create response did not include an instance ID.' unless instance && instance['id']
 
       state = build_state(vm_name, instance, resolved)
       sync_service_mode(state, install_vllm: install_vllm, install_ollama: install_ollama)
@@ -87,20 +115,23 @@ module HyperstackVM
         info "Missing firewall rules: #{missing.empty? ? 'none' : missing.size}"
       rescue Error => e
         warn_out "Unable to load VM #{state['vm_id']}: #{e.message}"
+        return state&.dig('public_ip')
       end
       connect_host_for(vm)
     end
 
-    def resolve_dependencies
+    def resolve_dependencies(flavor_name: nil)
+      flavor_name = @effective_flavor_name if flavor_name.nil? && @effective_flavor_name
+      flavor_name = @config.flavor_name if flavor_name.nil?
       environment = @client.list_environments.find { |item| item['name'] == @config.environment_name }
       raise Error, "Environment #{@config.environment_name.inspect} was not found in Hyperstack." unless environment
 
       flavor = @client.list_flavors.find do |item|
-        item['name'] == @effective_flavor_name && item['region_name'] == environment['region']
+        item['name'] == flavor_name && item['region_name'] == environment['region']
       end
-      raise Error, "Flavor #{@effective_flavor_name.inspect} is not available in #{environment['region']}." unless flavor
+      raise Error, "Flavor #{flavor_name.inspect} is not available in #{environment['region']}." unless flavor
       if flavor['stock_available'] == false
-        raise Error, "Flavor #{@effective_flavor_name.inspect} exists in #{environment['region']} but is out of stock."
+        raise Error, "Flavor #{flavor_name.inspect} exists in #{environment['region']} but is out of stock."
       end
 
       image = @client.list_images.find do |item|
@@ -146,29 +177,6 @@ module HyperstackVM
       end
     end
 
-    def ensure_security_rules(vm)
-      existing = Array(vm['security_rules'])
-      existing_norm = existing.map { |r| normalize_rule(r) }
-      desired = desired_rules.map { |r| normalize_rule(r) }
-
-      (desired - existing_norm).each do |rule|
-        info "Adding Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
-        @client.create_vm_rule(vm['id'], rule)
-      end
-
-      legacy_litellm(existing).each do |rule|
-        rule_id = rule['id'] || rule['rule_id']
-        unless rule_id
-          warn_out 'Found legacy Hyperstack firewall rule for port 4000, but the API payload has no rule id; remove it manually from the Hyperstack console.'
-          next
-        end
-        info "Removing legacy Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
-        @client.delete_vm_rule(vm['id'], rule_id)
-      rescue Error => e
-        warn_out "Failed to remove legacy Hyperstack firewall rule #{rule_id}: #{e.message}"
-      end
-    end
-
     def connect_host_for(vm)
       return vm['floating_ip'] if @config.assign_floating_ip?
       vm['floating_ip'] || vm['fixed_ip']
@@ -246,6 +254,74 @@ module HyperstackVM
 
     private
 
+    def resumable_state?(state)
+      state && state['vm_id'] && state['provisioned_at'].nil?
+    end
+
+    def print_create_dry_run(vm_name, resolved, payload, install_vllm:, install_ollama:, vllm_preset:)
+      info 'DRY RUN: no VM or state file will be created.'
+      info "State file: #{@state_store.path}"
+      info "Resolved environment: #{resolved[:environment]['name']} (region #{resolved[:environment]['region']})"
+      info "Resolved flavor: #{format_flavor(resolved[:flavor])}"
+      info "Resolved image: #{resolved[:image]['name']}"
+      info "Resolved SSH keypair: #{resolved[:keypair]['name']}"
+      info "Planned VM name: #{vm_name}"
+      info "Allowed SSH CIDRs: #{@config.allowed_ssh_cidrs.join(', ')}"
+      info "Allowed WireGuard CIDRs: #{@config.allowed_wireguard_cidrs.join(', ')}"
+      info 'Create payload:'
+      @out.puts(JSON.pretty_generate(payload))
+      if @config.guest_bootstrap_enabled?
+        info 'Guest bootstrap script:'
+        @out.puts(@scripts.guest_bootstrap_script)
+      else
+        info 'Guest bootstrap is disabled in config.'
+      end
+      if install_ollama
+        info "Ollama will be installed with models stored under #{@config.ollama_models_dir}"
+        models = @scripts.desired_ollama_models
+        info "Ollama models to pre-pull: #{models.join(', ')}" unless models.empty?
+      end
+      if install_vllm
+        preset_cfg = vllm_preset ? @config.vllm_preset(vllm_preset) : nil
+        vllm_m      = preset_cfg&.dig('model')          || @config.vllm_model
+        vllm_cname  = preset_cfg&.dig('container_name') || @config.vllm_container_name
+        vllm_maxlen = preset_cfg&.dig('max_model_len')  || @config.vllm_max_model_len
+        preset_note = vllm_preset ? " (preset: #{vllm_preset})" : ''
+        info "vLLM will be installed: #{vllm_m}#{preset_note}"
+        info "  Container: #{vllm_cname}, port #{@config.ollama_port}, max_model_len #{vllm_maxlen}"
+      end
+      if @config.wireguard_auto_setup?
+        info "WireGuard auto-setup script: #{@config.wireguard_setup_script} <vm_public_ip>"
+      end
+    end
+
+    def print_resume_dry_run(state, install_vllm:, install_ollama:, vllm_preset:)
+      info "DRY RUN: would resume provisioning tracked VM #{state['vm_id']}."
+      begin
+        vm = @client.get_vm(state['vm_id'])
+        info "Tracked VM status: #{vm['status']} / #{vm['vm_state']}"
+        ip = vm['floating_ip'] || vm['fixed_ip']
+        info "Tracked VM public IP: #{ip || 'none'}"
+      rescue Error => e
+        warn_out "Unable to inspect tracked VM #{state['vm_id']}: #{e.message}"
+      end
+      if @config.guest_bootstrap_enabled? && state['bootstrapped_at'].nil?
+        info 'Guest bootstrap script:'
+        @out.puts(@scripts.guest_bootstrap_script)
+      end
+      if install_ollama && state['ollama_installed_at'].nil?
+        info "Ollama would be installed with models stored under #{@config.ollama_models_dir}"
+        models = @scripts.desired_ollama_models
+        info "Ollama models to pre-pull: #{models.join(', ')}" unless models.empty?
+      end
+      if install_vllm && state['vllm_setup_at'].nil?
+        info "vLLM would be installed: #{state['vllm_model'] || @config.vllm_model}"
+      end
+      if @config.wireguard_auto_setup? && state['wireguard_setup_at'].nil?
+        info "WireGuard auto-setup script would run: #{@config.wireguard_setup_script} #{state['public_ip'] || '<pending-public_ip>'}"
+      end
+    end
+
     def build_payload(vm_name, resolved, install_vllm: nil, install_ollama: nil)
       payload = {
         'name' => vm_name,
@@ -306,16 +382,6 @@ module HyperstackVM
       parts.empty? ? 'All inference services disabled' : "#{parts.join(', ')} enabled"
     end
 
-    def legacy_litellm(rules)
-      Array(rules).select do |rule|
-        normalized = normalize_rule(rule)
-        normalized['protocol'] == 'tcp' &&
-          normalized['port_range_min'] == 4000 &&
-          normalized['port_range_max'] == 4000 &&
-          normalized['remote_ip_prefix'] == @config.wireguard_subnet
-      end
-    end
-
     def perform_local_cleanup(dry_run:)
       peers = @local_wireguard.remove_peers_by_allowed_ips(
         ["#{@config.wireguard_gateway_ip}/32"], dry_run: dry_run
diff --git a/pi/agent/extensions/fresh-subagent/README.md b/pi/agent/extensions/fresh-subagent/README.md
index ac2cde2..26630f0 100644
--- a/pi/agent/extensions/fresh-subagent/README.md
+++ b/pi/agent/extensions/fresh-subagent/README.md
@@ -141,11 +141,7 @@ In one-shot or print mode it runs the editor command directly.
 Alias with the same watched behavior:
 
 ```text
-/subagent-watch <prompt
-
-> Unknown command "/subagent-watch <prompt". Try /help?>
-
-
+/subagent-watch <prompt>
 ```
 
 Launch a visible fresh Pi session instead of a headless child:
diff --git a/pi/agent/extensions/loop-scheduler/loop-presets.md b/pi/agent/extensions/loop-scheduler/loop-presets.md
index f90cad3..1df5b22 100644
--- a/pi/agent/extensions/loop-scheduler/loop-presets.md
+++ b/pi/agent/extensions/loop-scheduler/loop-presets.md
@@ -8,6 +8,6 @@
 # * monitor: 10m check if there are any errors in the logs
 
 * tasks: 1m automatically start with the next task with fresh context if the current task completed following the agent-task-management skill.
-* proceed: 1m proceed with the next task following agent-task-management if the previous or currently tasks being worked on is completed and committed to git.
-* review: 1m review all code changes since the last review and add code review comments using agent-task-management skill. use go-bestpractices and SOLID skills.
+* proceed: 1m proceed with the next task following agent-task-management if the previous or current task being worked on is completed and committed to git.
+* review: 1m review all code changes since the last review and add code review comments using agent-task-management skill. use go-best-practices and solid-principles skills.
 * scifi: 1m write a scifi story about the current project or continue writing the story into STORY.md. 
diff --git a/pi/agent/extensions/nemotron-tool-repair/README.md b/pi/agent/extensions/nemotron-tool-repair/README.md
index 69fcb27..ff06401 100644
--- a/pi/agent/extensions/nemotron-tool-repair/README.md
+++ b/pi/agent/extensions/nemotron-tool-repair/README.md
@@ -24,14 +24,7 @@ same model IDs, but they do not go through the Nemotron repair path.
 
 ## Usage Flow
 
-Start Pi the same way as before:
-
-```bash
-cd /home/paul/git/conf/snippets/hyperstack
-./pi-vm1
-```
-
-or explicitly:
+Start Pi with the Nemotron model:
 
 ```bash
 pi --model 'hyperstack1/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit'
diff --git a/pi/plans/gt-plan.md b/pi/plans/gt-plan.md
deleted file mode 100644
index 7cf5a38..0000000
--- a/pi/plans/gt-plan.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Project gt – Gap Analysis and Improvement Plan
-
-## Overall Picture & Goals
-
-- Provide a reliable, well‑documented command‑line percentage calculator with RPN and rational number support.
-- Deliver a smooth developer experience: clear contribution guidelines, automated CI/CD, and proper versioning.
-- Ensure the codebase follows Go best practices, has comprehensive tests, and ships a stable binary.
-
-Plan:
-
-1. **Fix CI build step** – Update GitHub Actions workflow to build the correct binary path (`./cmd/gt` instead of `./cmd/perc`).
-2. **Update `go.mod` Go version** – Change the `go` directive to a supported version (e.g. `go 1.22`) to match the CI Go version.
-3. **Add `CONTRIBUTING.md`** – Provide guidelines for building, testing, using `mage`, and submitting pull requests.
-4. **Expand README** – Include concrete examples for rational‑mode (`rat on/off/toggle`) and hyper‑operators (`[+]`, `[*]`, etc.).
-5. **Add badges to README** – CI status, test coverage, and Go Report Card badges.
-6. **Add end‑to‑end CLI tests** – Test the built binary for commands like `gt version`, `gt 20% of 150`, and `gt help`.
-7. **Add REPL command tests** – Cover built‑in commands (`help`, `clear`, `quit`, `rat`) and variable management (`vars`, `clear`, `name d`).
-8. **Add `.goreleaser.yml`** – Set up automated release builds and GitHub releases.
-9. **Implement version bump workflow** – Use the `increment-version-and-push` skill to bump the version, tag, and push.
-10. **Document variable management** – Add a dedicated README section describing `vars`, `clear`, and variable deletion commands.
-11. **Update Magefile** – Add shortcuts for build, test, lint, and release.
-12. **Add missing Go documentation** – Ensure all exported functions in the REPL package (`NewREPL`, `RunREPL`, `executor`, `defaultExecutor`, `defaultCompleter`, `defaultGetCommandDescription`) have godoc comments.
-13. **Add go vet step to CI workflow** – Include a `go vet ./...` step in the GitHub Actions CI configuration to catch static analysis issues.
-14. **Add godoc comments for exported TTYChecker methods (IsTTY, EnsureTTY)**.
-15. **Add godoc comments for exported SignalHandler methods (Start, Stop)**.
-16. **Add SPDX license headers to all .go source files**.
-17. **Wrap errors with %w where appropriate for better error chaining**.
-18. **Design a nice logo for the gt project (e.g., stylized 'gt' with calculator motif)**.
-