summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--PLAN-L40.md157
-rw-r--r--README.md13
-rw-r--r--hypr.fish2
-rw-r--r--lib/hyperstack/cli.rb5
-rw-r--r--lib/hyperstack/manager.rb50
-rw-r--r--lib/hyperstack/provisioning_orchestrator.rb27
-rw-r--r--lib/hyperstack/ssh_runner.rb1
-rw-r--r--lib/hyperstack/vm_lifecycle.rb150
-rw-r--r--pi/agent/extensions/fresh-subagent/README.md6
-rw-r--r--pi/agent/extensions/loop-scheduler/loop-presets.md4
-rw-r--r--pi/agent/extensions/nemotron-tool-repair/README.md9
-rw-r--r--pi/plans/gt-plan.md29
12 files changed, 185 insertions, 268 deletions
diff --git a/PLAN-L40.md b/PLAN-L40.md
deleted file mode 100644
index 3d0b1ff..0000000
--- a/PLAN-L40.md
+++ /dev/null
@@ -1,157 +0,0 @@
-# Plan: VM1 on Hyperstack L40 with Qwen3.6 MoE + TurboQuant
-
-**Prepared:** 2026-05-24
-**Scope:** Research and planning only — no code changes, no provisioning.
-
----
-
-## 1. GPU and VM sizing (Hyperstack L40)
-
-| Item | Assessment |
-|---|---|
-| **Flavor** | Hyperstack’s GPU flavors use the `n3-*` prefix (see current `n3-A100x1` / `n3-H100x1`). The L40 48 GB flavor is expected to be named `n3-L40x1` or `n3-L40Sx1`; exact string must be verified via the Hyperstack console/API before updating `hyperstack-vm1.toml`. |
-| **VRAM** | 48 GB (vs 80 GB on the current A100). That is a hard ceiling for both model weights and KV cache. |
-| **Cost** | L40/L40S nodes are generally cheaper than A100/H100 on Hyperstack. Assuming the tiered pricing model, an L40 should reduce the hourly cost of VM1, but the final price depends on the exact `flavor_name` and any egress charges. |
-
-## 2. Model choice: what actually fits on 48 GB
-
-The prompt mentions **Qwen3.6 MoE (e.g. 235B-A22B)**. A 235B-parameter model in BF16 would require **> 400 GB** of VRAM, which is impossible on a single L40. The only Qwen3.6 MoE that is publicly released and could *potentially* fit is **Qwen3.6-35B-A3B** (35B total / 3B active), but even that is **~70 GB in BF16**.
-
-**Realistic options to make it fit in 48 GB:**
-
-| Option | Weight size (est.) | Fit on 48 GB? | Notes |
-|---|---|---|---|
-| **AWQ 4-bit** Qwen3.6-35B-A3B | ~18 GB | Yes | Needs a community or official AWQ checkpoint (not yet listed as official at the time of writing, but AWQ/GPTQ variants usually appear quickly). |
-| **FP8** Qwen3.6-35B-A3B (if available) | ~35 GB | Tight | Leaves ~10 GB for KV cache, activations and CUDA graphs. vLLM profiling may tip it over. |
-| **Qwen3.6-27B dense** (current VM2 default) | ~27 GB FP8 | Yes | Not MoE; defeats the purpose of the task. |
-
-**Recommendation:** Target an **AWQ 4-bit (or GPTQ 4-bit) Qwen3.6-35B-A3B** checkpoint, or wait for an official **FP8** checkpoint and accept a reduced `max_model_len`. Do not attempt the 235B-A22B variant on a single L40.
-
-## 3. vLLM + TurboQuant compatibility
-
-TurboQuant is a KV-cache compression backend in vLLM. Key upstream state:
-
-- **PR #39931** (merged 2026-05-05) added TurboQuant support for *hybrid* architectures (attention + Mamba/MoE).
-- **Issue #41726** reports a fatal crash during **chunked continuation prefill** on hybrid MoE models (e.g. Qwen3.5-9B NVFP4). Root cause: TurboQuant’s `_continuation_prefill` path requests workspace memory that was not reserved during warmup.
-- **PR #40798** is open as a candidate fix but **not yet merged**.
-
-**Implications for Qwen3.6-35B-A3B:**
-- Because Qwen3.6 uses a hybrid attention+Mamba architecture, it is in the exact class of models affected by #41726.
-- If TurboQuant is enabled (`--kv-cache-dtype turboquant_k8v4`, `--kv-cache-dtype turboquant_4bit_nc`, etc.), any long prompt that crosses a chunked-prefill boundary will likely trigger:
- ```
- AssertionError: Workspace is locked but allocation ... requires X MB, current size is Y MB.
- ```
-
-**Mitigations available today:**
-1. **Disable chunked prefill:** Pass `--no-enable-chunked-prefill` in `extra_vllm_args`. This avoids the `_continuation_prefill` path entirely. Trade-off: large prefills are no longer split into chunks, which can increase latency for long inputs and may OOM if a single prefill is very large.
-2. **Use `--enforce-eager`:** Disables CUDA graph capture, which slightly changes memory layout but does **not** solve the workspace lock issue by itself. It is useful mainly to save a few GB of VRAM on tight GPUs.
-3. **Wait for PR #40798** to merge and land in a stable vLLM image.
-
-## 4. Recommended `hyperstack-vm1.toml` changes (conceptual)
-
-```toml
-[vm]
-# Verify exact flavor string with Hyperstack API before deploying.
-flavor_name = "n3-L40x1" # or n3-L40Sx1
-labels = ["qwen36-moe", "wireguard"]
-
-[vllm]
-install = true
-model = "Qwen/Qwen3.6-35B-A3B-AWQ" # or the best available quantized MoE
-container_name = "vllm_qwen36_moe"
-max_model_len = 65536 # conservative for 48 GB; can raise if AWQ
-gpu_memory_utilization = 0.92
-tensor_parallel_size = 1
-tool_call_parser = "qwen3_coder"
-
-# TurboQuant KV cache on a hybrid MoE
-extra_vllm_args = [
- "--reasoning-parser", "qwen3",
- "--kv-cache-dtype", "turboquant_k8v4",
- "--no-enable-chunked-prefill" # mitigation for issue #41726
-]
-
-# Nightly image post-PR-39931 is required; pin to a known-good digest until 0.20.2+
-docker_image = "vllm/vllm-openai:nightly"
-```
-
-**VRAM estimate (AWQ 4-bit + TurboQuant K8V4 on L40 48 GB):**
-
-| Consumer | Est. size |
-|---|---|
-| AWQ weights (35B params @ 4-bit) | ~18 GB |
-| Activations / MoE routing / logits | ~4–6 GB |
-| CUDA graphs (if not eager) | ~2 GB |
-| KV cache (TurboQuant) | ~20–24 GB |
-| **Headroom** | **~0–4 GB** |
-
-Because headroom is thin, `gpu_memory_utilization=0.92` is appropriate. If profiling OOMs, raise it to `0.95` or drop `max_model_len`. If vLLM still OOMs during startup, try `--enforce-eager` to reclaim the CUDA-graph memory.
-
-## 5. CLI and WireGuard implications
-
-| Area | Impact |
-|---|---|
-| `--vm 1 / 2 / both` | No structural changes. The CLI already resolves `hyperstack-vm1.toml` independently via its own state file. Switching the flavor/model is transparent to `--vm 2`. |
-| WireGuard | `wireguard_server_ip = "192.168.3.1"` stays the same. Recreating VM1 yields a new public IP, so the local `wg1.conf` peer endpoint must be refreshed (`ruby hyperstack.rb --vm 1 create` already handles this via `wg1-setup.sh`). The tunnel subnet `192.168.3.0/24` is unchanged. |
-| Port 11434 / firewall | Unchanged. Port 56710 UDP and 22 TCP remain locked to `allowed_wireguard_cidrs` / `allowed_ssh_cidrs`. |
-| Dual-VM routing | The client can continue to round-robin or fallback between `192.168.3.1` (VM1, MoE) and `192.168.3.3` (VM2, dense). No code changes needed. |
-
-## 6. Risks
-
-| Risk | Severity | Mitigation |
-|---|---|---|
-| **TurboQuant crash (#41726)** on hybrid MoE | High | Disable chunked prefill now; migrate to fixed vLLM nightly once PR #40798 lands. |
-| **Model does not fit** in 48 GB if no AWQ/FP8 checkpoint exists | High | Confirm a 4-bit or FP8 checkpoint is on HuggingFace before provisioning. Fallback to Qwen3.6-27B dense (moves goalposts). |
-| **Performance regression** from no chunked prefill | Medium | Expect higher TTFB on long prompts. Monitor with `ruby hyperstack.rb --vm 1 test`. |
-| **Flavor unavailability** | Medium | Have a fallback flavor ready (e.g. `n3-A100x1` on VM1 if L40 is sold out), or accept A100 pricing. |
-| **Nightly Docker image instability** | Medium | Pin to a specific digest (`vllm/vllm-openai@sha256:...`) after first successful smoke test. |
-
-## 7. Step-by-step migration plan (if you decide to proceed)
-
-1. **Verify asset availability**
- - Confirm Hyperstack offers an L40 flavor and note its exact name.
- - Locate a Qwen3.6-35B-A3B AWQ/FP8 checkpoint on HuggingFace. If none exists, abort or pivot to the dense 27B.
-
-2. **Snapshot / backup**
- - Ensure VM2 (A100 dense) is stable and passing tests (`ruby hyperstack.rb --vm 2 test`).
- - Save current VM1 state file as `.hyperstack-vm1-state.json.bak` in case a fast rollback is needed.
-
-3. **Update configuration**
- - Edit `hyperstack-vm1.toml`:
- - `flavor_name` → L40 flavor.
- - `[vllm]` block → new model ID, container name, conservative `max_model_len`.
- - Add `docker_image = "vllm/vllm-openai:nightly"` (or a pinned digest).
- - Add TurboQuant arg and chunked-prefill mitigation to `extra_vllm_args`.
- - Update `[vm] labels` to reflect the new model.
-
-4. **Provision**
- ```bash
- ruby hyperstack.rb --vm 1 create --replace
- ```
- The `--replace` flag tears down the old A100 VM1 and rebuilds it on L40.
-
-5. **Post-create validation**
- - Check WireGuard handshake: `sudo wg show wg1 latest-handshakes`.
- - Ping tunnel IP: `ping -c 3 192.168.3.1`.
- - Query vLLM: `curl -s http://192.168.3.1:11434/v1/models`.
- - Run the automated test suite: `ruby hyperstack.rb --vm 1 test`.
-
-6. **Smoke test for TurboQuant stability**
- - Send a conversation with a very long system prompt (> 4096 tokens) and tool schemas to force a chunked-prefill boundary.
- - If the engine crashes with the workspace assertion, apply the fallback:
- - Add `--enforce-eager` to `extra_vllm_args`, or
- - Fall back to `--kv-cache-dtype fp8` (loses TurboQuant compression but is stable).
-
-7. **Dual-VM confirmation**
- - Run `ruby hyperstack.rb --vm both test` to ensure both endpoints are healthy and reachable through the WireGuard tunnel.
-
-8. **Monitor and iterate**
- - Watch VRAM usage with `nvidia-smi` inside the VM.
- - Adjust `max_model_len` and `gpu_memory_utilization` as needed.
- - Once upstream PR #40798 merges, rebuild the Docker image with the fixed vLLM version and re-enable chunked prefill.
-
----
-
-## Bottom line
-
-The L40 is a cost-efficient target *if* a quantized Qwen3.6-35B-A3B checkpoint is available. The biggest blocker is the open vLLM issue #41726 (TurboQuant + hybrid MoE crash on chunked prefill). Disabling chunked prefill is a viable short-term workaround, but it comes with a latency trade-off and must be validated before making VM1 the default endpoint.
diff --git a/README.md b/README.md
index a27ddbd..ecd5714 100644
--- a/README.md
+++ b/README.md
@@ -154,7 +154,7 @@ Each Hyperstack VM runs a vLLM instance; Pi connects to it directly over the Wir
Install Pi from [pi.dev](https://pi.dev), then link the project-local config into place:
```bash
-ln -s /path/to/hyperstack/pi ~/.pi
+ln -s /path/to/hypr/pi ~/.pi
```
This symlink makes Pi pick up `pi/agent/models.json` and `pi/agent/settings.json`
@@ -163,11 +163,11 @@ definitions are available without any manual config editing.
### Fish shell abbreviations
-Source `hyperstack.fish` or copy the abbreviations into your Fish config:
+Source `hypr.fish` or copy the abbreviations into your Fish config:
```fish
abbr pi-hyperstack pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
-abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
+abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
abbr pi-hyperstack-qwen36 pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8
abbr pi-hyperstack-gemma4 pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit
```
@@ -176,7 +176,7 @@ Then launch a session after the VM(s) are up:
```fish
pi-hyperstack # Qwen3.6 27B FP8 on VM1
-pi-hyperstack-coder # Qwen3.6 27B FP8 on VM1
+pi-hyperstack-coder # Qwen3.6 27B FP8 on VM1
pi-hyperstack-qwen36 # Qwen3.6 27B FP8 on VM2
pi-hyperstack-gemma4 # Gemma 4 31B on VM2
```
@@ -280,10 +280,8 @@ Available presets (both VMs share the same set):
|---|---|---|---|
| `gemma4-31b` | Gemma 4 31B IT (AWQ-4bit) | ~19 GB | 32K–128K (see TOML) |
| `nemotron-super` | Nemotron-3-Super 120B (Mamba+MoE, 12B active) | ~60 GB | 131K |
-| `qwen36-35b-a3b` | Qwen3.6-35B-A3B MoE (AWQ, 3B active) | ~18 GB | 65K* |
+| `qwen36-35b-a3b` | Qwen3.6-35B-A3B MoE (AWQ, 3B active) | ~18 GB | 65K* (needs a quantized checkpoint) |
| `qwen36-27b` | Qwen3.6 27B FP8 | ~45 GB | 262K |
-
-\* Needs a quantized checkpoint on HuggingFace before it can run on a single GPU.
| `qwen25-coder-32b` | Qwen2.5-Coder-32B-Instruct (AWQ) | ~18 GB | 32K |
| `qwen3-coder-30b` | Qwen3-Coder-30B-A3B (MoE, AWQ) | ~18 GB | 65K |
| `deepseek-r1-32b` | DeepSeek-R1-Distill-Qwen-32B (AWQ) | ~18 GB | 32K |
@@ -317,7 +315,6 @@ All commands accept --vm 1|2|both (default: 1).
## Configuration
Edit `hyperstack-vm1.toml` / `hyperstack-vm2.toml`.
-Use `hyperstack-vm1-nemotron.toml` for a dual-H100 Nemotron-3-Super profile on the VM1 slot (same state file as `hyperstack-vm1.toml` — use one or the other).
Key sections:
| Section | Purpose |
diff --git a/hypr.fish b/hypr.fish
index d75dccb..78e1f7a 100644
--- a/hypr.fish
+++ b/hypr.fish
@@ -3,7 +3,7 @@ abbr pi-hyperstack pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8
abbr pi-hyperstack-qwen36 pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8
abbr pi-hyperstack-gemma4 pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit
-abbr hyperstack-create ruby ~/git/hyperstack/hyperstack.rb create
+abbr hyperstack-create ruby ~/git/hypr/hyperstack.rb create
# Ollama (local endpoint pointing at cloud models)
abbr pi-ollama-kimi pi --provider ollama --model kimi-k2.6:cloud
diff --git a/lib/hyperstack/cli.rb b/lib/hyperstack/cli.rb
index 76f158e..b5bcaff 100644
--- a/lib/hyperstack/cli.rb
+++ b/lib/hyperstack/cli.rb
@@ -1,5 +1,6 @@
# frozen_string_literal: true
+require 'json'
require 'optparse'
require 'socket'
@@ -383,9 +384,9 @@ module HyperstackVM
hostnames = loaders.map { |loader| loader.config.wireguard_gateway_hostname }
begin
local_manager = build_manager(loaders.first.config, out: local_wg_out)
- cleanup = local_manager.send(:cleanup_local_access, dry_run: dry_run, hostnames: hostnames,
+ cleanup = local_manager.cleanup_local_access(dry_run: dry_run, hostnames: hostnames,
allowed_ips: allowed_ips)
- local_manager.send(:report_local_cleanup, local_wg_out, cleanup, dry_run: dry_run)
+ local_manager.report_local_cleanup(local_wg_out, cleanup, dry_run: dry_run)
rescue Error => e
errors[:local_wireguard] = e.message
end
diff --git a/lib/hyperstack/manager.rb b/lib/hyperstack/manager.rb
index 2150554..cecf11d 100644
--- a/lib/hyperstack/manager.rb
+++ b/lib/hyperstack/manager.rb
@@ -1,5 +1,6 @@
# frozen_string_literal: true
+require_relative 'provisioning'
require_relative 'ssh_runner'
require_relative 'vm_lifecycle'
require_relative 'wireguard_setup'
@@ -69,26 +70,19 @@ module HyperstackVM
def create(replace: false, dry_run: false, install_vllm: nil, install_ollama: nil,
flavor_name: nil, vllm_preset: nil)
- raise Error, "DRY RUN is not supported." if dry_run
-
- if replace
- existing = @state_store.load
- if existing && existing['vm_id']
- @vm_lifecycle.delete(vm_id: existing['vm_id'])
- end
- end
-
install_vllm = @config.vllm_install_enabled? if install_vllm.nil?
install_ollama = @config.ollama_install_enabled? if install_ollama.nil?
state = @vm_lifecycle.create(
+ replace: replace,
+ dry_run: dry_run,
flavor_name: flavor_name,
vllm_preset: vllm_preset,
install_vllm: install_vllm,
install_ollama: install_ollama
- ) do |s|
- @local_wireguard.show_local_wireguard(s['public_ip'])
- end
+ ) { |s| show_local_wireguard([s['public_ip']].compact) }
+
+ return if state.nil?
@orchestrator.run(
state,
@@ -112,7 +106,7 @@ module HyperstackVM
def status(include_local_wireguard: true)
ip = @vm_lifecycle.status
- @local_wireguard.show_local_wireguard(ip) if include_local_wireguard
+ show_local_wireguard([ip].compact) if include_local_wireguard
ip
end
@@ -132,5 +126,35 @@ module HyperstackVM
def list_models
@vm_lifecycle.list_models
end
+
+ def cleanup_local_access(dry_run:, hostnames:, allowed_ips:)
+ peers = @local_wireguard.remove_peers_by_allowed_ips(allowed_ips, dry_run: dry_run)
+ removed_hosts = @local_wireguard.remove_hostnames(hostnames, dry_run: dry_run)
+ { peers: peers, hostnames: removed_hosts }
+ end
+
+ def report_local_cleanup(output, cleanup, dry_run:)
+ peer_summary = cleanup[:peers].map { |peer| peer['AllowedIPs'] || peer['Endpoint'] }.join(', ')
+ host_summary = cleanup[:hostnames].join(', ')
+
+ if dry_run
+ if cleanup[:peers].empty? && cleanup[:hostnames].empty?
+ output.puts('DRY RUN: no matching local WireGuard peers or host entries would be removed.')
+ return
+ end
+ unless cleanup[:peers].empty?
+ output.puts("DRY RUN: local WireGuard peers would be removed for #{peer_summary}.")
+ end
+ unless cleanup[:hostnames].empty?
+ output.puts("DRY RUN: local host entries would be removed for #{host_summary}.")
+ end
+ return
+ end
+
+ output.puts('No matching local WireGuard peers needed removal.') if cleanup[:peers].empty?
+ output.puts('No matching local host entries needed removal.') if cleanup[:hostnames].empty?
+ output.puts("Local WireGuard peers removed for #{peer_summary}.") unless cleanup[:peers].empty?
+ output.puts("Local host entries removed for #{host_summary}.") unless cleanup[:hostnames].empty?
+ end
end
end
diff --git a/lib/hyperstack/provisioning_orchestrator.rb b/lib/hyperstack/provisioning_orchestrator.rb
index f3222d9..8abfec8 100644
--- a/lib/hyperstack/provisioning_orchestrator.rb
+++ b/lib/hyperstack/provisioning_orchestrator.rb
@@ -75,7 +75,6 @@ module HyperstackVM
@state_store.save(state)
info "VM ready: #{state['public_ip']} (id=#{state['vm_id']})"
- @inference_tester.config.show_local_wireguard(state['public_ip']) rescue nil
@inference_tester.test(state)
state
end
@@ -106,6 +105,18 @@ module HyperstackVM
info "Adding Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
@client.create_vm_rule(vm['id'], rule)
end
+
+ legacy_litellm_rules(existing).each do |rule|
+ rule_id = rule['id'] || rule['rule_id']
+ unless rule_id
+ warn_out 'Found legacy Hyperstack firewall rule for port 4000, but the API payload has no rule id; remove it manually from the Hyperstack console.'
+ next
+ end
+ info "Removing legacy Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
+ @client.delete_vm_rule(vm['id'], rule_id)
+ rescue Error => e
+ warn_out "Failed to remove legacy Hyperstack firewall rule #{rule_id}: #{e.message}"
+ end
end
def effective_ollama?
@@ -165,6 +176,16 @@ module HyperstackVM
%w[ACTIVE SHUTOFF HIBERNATED].include?(vm['status'].to_s.upcase)
end
+ def legacy_litellm_rules(rules)
+ Array(rules).select do |rule|
+ normalized = normalize_rule(rule)
+ normalized['protocol'] == 'tcp' &&
+ normalized['port_range_min'] == 4000 &&
+ normalized['port_range_max'] == 4000 &&
+ normalized['remote_ip_prefix'] == @config.wireguard_subnet
+ end
+ end
+
private
def with_polling(description, timeout: 900, interval: 5)
@@ -183,5 +204,9 @@ module HyperstackVM
def info(msg)
@out.puts(msg)
end
+
+ def warn_out(msg)
+ @out.puts("WARN: #{msg}")
+ end
end
end
diff --git a/lib/hyperstack/ssh_runner.rb b/lib/hyperstack/ssh_runner.rb
index f41859d..e4440b6 100644
--- a/lib/hyperstack/ssh_runner.rb
+++ b/lib/hyperstack/ssh_runner.rb
@@ -1,5 +1,6 @@
# frozen_string_literal: true
+require 'fileutils'
require 'open3'
require 'socket'
diff --git a/lib/hyperstack/vm_lifecycle.rb b/lib/hyperstack/vm_lifecycle.rb
index 972c896..cc52880 100644
--- a/lib/hyperstack/vm_lifecycle.rb
+++ b/lib/hyperstack/vm_lifecycle.rb
@@ -1,5 +1,8 @@
# frozen_string_literal: true
+require 'json'
+require_relative 'provisioning'
+
module HyperstackVM
# Orchestrates the VM lifecycle from creation through deletion.
class VmLifecycle
@@ -9,27 +12,52 @@ module HyperstackVM
@state_store = state_store
@local_wireguard = local_wireguard
@out = out
+ @scripts = ProvisioningScripts.new(config: config)
end
attr_reader :config, :client, :state_store
- def create(flavor_name: nil, vllm_preset: nil, install_vllm: nil, install_ollama: nil, &block)
+ def create(replace: false, dry_run: false, flavor_name: nil, vllm_preset: nil,
+ install_vllm: nil, install_ollama: nil, &block)
@effective_flavor_name = flavor_name.nil? ? @config.flavor_name : flavor_name
@state_store.load if defined?(@state_store) # force load
existing = @state_store.load
if existing && existing['vm_id']
- raise Error,
- "State file #{@state_store.path} already tracks VM #{existing['vm_id']}. Use --replace or delete first."
+ if replace
+ if dry_run
+ info "DRY RUN: would delete tracked VM #{existing['vm_id']} before creating a replacement."
+ show_local_wireguard([])
+ return nil
+ else
+ delete(vm_id: existing['vm_id'])
+ end
+ elsif resumable_state?(existing)
+ if dry_run
+ print_resume_dry_run(existing, install_vllm: install_vllm, install_ollama: install_ollama, vllm_preset: vllm_preset)
+ return nil
+ end
+ info "Resuming tracked VM #{existing['vm_id']} provisioning..."
+ return existing
+ else
+ raise Error,
+ "State file #{@state_store.path} already tracks VM #{existing['vm_id']}. Use --replace or delete first."
+ end
end
resolved = resolve_dependencies
vm_name = @config.generated_vm_name
- info "Creating VM #{vm_name} in #{resolved[:environment]['name']} using #{@effective_flavor_name}..."
+ info (dry_run ? "Planning" : "Creating") + " VM #{vm_name} in #{resolved[:environment]['name']} using #{@effective_flavor_name}..."
payload = build_payload(vm_name, resolved, install_vllm: install_vllm, install_ollama: install_ollama)
+ if dry_run
+ print_create_dry_run(vm_name, resolved, payload, install_vllm: install_vllm, install_ollama: install_ollama, vllm_preset: vllm_preset)
+ show_local_wireguard([])
+ return nil
+ end
+
response = @client.create_vm(payload)
instance = Array(response['instances']).first
- raise Error, 'Hyperstack create response did not include an instance ID.' unless instance&&['id']
+ raise Error, 'Hyperstack create response did not include an instance ID.' unless instance && instance['id']
state = build_state(vm_name, instance, resolved)
sync_service_mode(state, install_vllm: install_vllm, install_ollama: install_ollama)
@@ -87,20 +115,23 @@ module HyperstackVM
info "Missing firewall rules: #{missing.empty? ? 'none' : missing.size}"
rescue Error => e
warn_out "Unable to load VM #{state['vm_id']}: #{e.message}"
+ return state&.dig('public_ip')
end
connect_host_for(vm)
end
- def resolve_dependencies
+ def resolve_dependencies(flavor_name: nil)
+ flavor_name = @effective_flavor_name if flavor_name.nil? && @effective_flavor_name
+ flavor_name = @config.flavor_name if flavor_name.nil?
environment = @client.list_environments.find { |item| item['name'] == @config.environment_name }
raise Error, "Environment #{@config.environment_name.inspect} was not found in Hyperstack." unless environment
flavor = @client.list_flavors.find do |item|
- item['name'] == @effective_flavor_name && item['region_name'] == environment['region']
+ item['name'] == flavor_name && item['region_name'] == environment['region']
end
- raise Error, "Flavor #{@effective_flavor_name.inspect} is not available in #{environment['region']}." unless flavor
+ raise Error, "Flavor #{flavor_name.inspect} is not available in #{environment['region']}." unless flavor
if flavor['stock_available'] == false
- raise Error, "Flavor #{@effective_flavor_name.inspect} exists in #{environment['region']} but is out of stock."
+ raise Error, "Flavor #{flavor_name.inspect} exists in #{environment['region']} but is out of stock."
end
image = @client.list_images.find do |item|
@@ -146,29 +177,6 @@ module HyperstackVM
end
end
- def ensure_security_rules(vm)
- existing = Array(vm['security_rules'])
- existing_norm = existing.map { |r| normalize_rule(r) }
- desired = desired_rules.map { |r| normalize_rule(r) }
-
- (desired - existing_norm).each do |rule|
- info "Adding Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
- @client.create_vm_rule(vm['id'], rule)
- end
-
- legacy_litellm(existing).each do |rule|
- rule_id = rule['id'] || rule['rule_id']
- unless rule_id
- warn_out 'Found legacy Hyperstack firewall rule for port 4000, but the API payload has no rule id; remove it manually from the Hyperstack console.'
- next
- end
- info "Removing legacy Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..."
- @client.delete_vm_rule(vm['id'], rule_id)
- rescue Error => e
- warn_out "Failed to remove legacy Hyperstack firewall rule #{rule_id}: #{e.message}"
- end
- end
-
def connect_host_for(vm)
return vm['floating_ip'] if @config.assign_floating_ip?
vm['floating_ip'] || vm['fixed_ip']
@@ -246,6 +254,74 @@ module HyperstackVM
private
+ def resumable_state?(state)
+ state && state['vm_id'] && state['provisioned_at'].nil?
+ end
+
+ def print_create_dry_run(vm_name, resolved, payload, install_vllm:, install_ollama:, vllm_preset:)
+ info 'DRY RUN: no VM or state file will be created.'
+ info "State file: #{@state_store.path}"
+ info "Resolved environment: #{resolved[:environment]['name']} (region #{resolved[:environment]['region']})"
+ info "Resolved flavor: #{format_flavor(resolved[:flavor])}"
+ info "Resolved image: #{resolved[:image]['name']}"
+ info "Resolved SSH keypair: #{resolved[:keypair]['name']}"
+ info "Planned VM name: #{vm_name}"
+ info "Allowed SSH CIDRs: #{@config.allowed_ssh_cidrs.join(', ')}"
+ info "Allowed WireGuard CIDRs: #{@config.allowed_wireguard_cidrs.join(', ')}"
+ info 'Create payload:'
+ @out.puts(JSON.pretty_generate(payload))
+ if @config.guest_bootstrap_enabled?
+ info 'Guest bootstrap script:'
+ @out.puts(@scripts.guest_bootstrap_script)
+ else
+ info 'Guest bootstrap is disabled in config.'
+ end
+ if install_ollama
+ info "Ollama will be installed with models stored under #{@config.ollama_models_dir}"
+ models = @scripts.desired_ollama_models
+ info "Ollama models to pre-pull: #{models.join(', ')}" unless models.empty?
+ end
+ if install_vllm
+ preset_cfg = vllm_preset ? @config.vllm_preset(vllm_preset) : nil
+ vllm_m = preset_cfg&.dig('model') || @config.vllm_model
+ vllm_cname = preset_cfg&.dig('container_name') || @config.vllm_container_name
+ vllm_maxlen = preset_cfg&.dig('max_model_len') || @config.vllm_max_model_len
+ preset_note = vllm_preset ? " (preset: #{vllm_preset})" : ''
+ info "vLLM will be installed: #{vllm_m}#{preset_note}"
+ info " Container: #{vllm_cname}, port #{@config.ollama_port}, max_model_len #{vllm_maxlen}"
+ end
+ if @config.wireguard_auto_setup?
+ info "WireGuard auto-setup script: #{@config.wireguard_setup_script} <vm_public_ip>"
+ end
+ end
+
+ def print_resume_dry_run(state, install_vllm:, install_ollama:, vllm_preset:)
+ info "DRY RUN: would resume provisioning tracked VM #{state['vm_id']}."
+ begin
+ vm = @client.get_vm(state['vm_id'])
+ info "Tracked VM status: #{vm['status']} / #{vm['vm_state']}"
+ ip = vm['floating_ip'] || vm['fixed_ip']
+ info "Tracked VM public IP: #{ip || 'none'}"
+ rescue Error => e
+ warn_out "Unable to inspect tracked VM #{state['vm_id']}: #{e.message}"
+ end
+ if @config.guest_bootstrap_enabled? && state['bootstrapped_at'].nil?
+ info 'Guest bootstrap script:'
+ @out.puts(@scripts.guest_bootstrap_script)
+ end
+ if install_ollama && state['ollama_installed_at'].nil?
+ info "Ollama would be installed with models stored under #{@config.ollama_models_dir}"
+ models = @scripts.desired_ollama_models
+ info "Ollama models to pre-pull: #{models.join(', ')}" unless models.empty?
+ end
+ if install_vllm && state['vllm_setup_at'].nil?
+ info "vLLM would be installed: #{state['vllm_model'] || @config.vllm_model}"
+ end
+ if @config.wireguard_auto_setup? && state['wireguard_setup_at'].nil?
+ info "WireGuard auto-setup script would run: #{@config.wireguard_setup_script} #{state['public_ip'] || '<pending-public_ip>'}"
+ end
+ end
+
def build_payload(vm_name, resolved, install_vllm: nil, install_ollama: nil)
payload = {
'name' => vm_name,
@@ -306,16 +382,6 @@ module HyperstackVM
parts.empty? ? 'All inference services disabled' : "#{parts.join(', ')} enabled"
end
- def legacy_litellm(rules)
- Array(rules).select do |rule|
- normalized = normalize_rule(rule)
- normalized['protocol'] == 'tcp' &&
- normalized['port_range_min'] == 4000 &&
- normalized['port_range_max'] == 4000 &&
- normalized['remote_ip_prefix'] == @config.wireguard_subnet
- end
- end
-
def perform_local_cleanup(dry_run:)
peers = @local_wireguard.remove_peers_by_allowed_ips(
["#{@config.wireguard_gateway_ip}/32"], dry_run: dry_run
diff --git a/pi/agent/extensions/fresh-subagent/README.md b/pi/agent/extensions/fresh-subagent/README.md
index ac2cde2..26630f0 100644
--- a/pi/agent/extensions/fresh-subagent/README.md
+++ b/pi/agent/extensions/fresh-subagent/README.md
@@ -141,11 +141,7 @@ In one-shot or print mode it runs the editor command directly.
Alias with the same watched behavior:
```text
-/subagent-watch <prompt
-
-> Unknown command "/subagent-watch <prompt". Try /help?>
-
-
+/subagent-watch <prompt>
```
Launch a visible fresh Pi session instead of a headless child:
diff --git a/pi/agent/extensions/loop-scheduler/loop-presets.md b/pi/agent/extensions/loop-scheduler/loop-presets.md
index f90cad3..1df5b22 100644
--- a/pi/agent/extensions/loop-scheduler/loop-presets.md
+++ b/pi/agent/extensions/loop-scheduler/loop-presets.md
@@ -8,6 +8,6 @@
# * monitor: 10m check if there are any errors in the logs
* tasks: 1m automatically start with the next task with fresh context if the current task completed following the agent-task-management skill.
-* proceed: 1m proceed with the next task following agent-task-management if the previous or currently tasks being worked on is completed and committed to git.
-* review: 1m review all code changes since the last review and add code review comments using agent-task-management skill. use go-bestpractices and SOLID skills.
+* proceed: 1m proceed with the next task following agent-task-management if the previous or current task being worked on is completed and committed to git.
+* review: 1m review all code changes since the last review and add code review comments using agent-task-management skill. use go-best-practices and solid-principles skills.
* scifi: 1m write a scifi story about the current project or continue writing the story into STORY.md.
diff --git a/pi/agent/extensions/nemotron-tool-repair/README.md b/pi/agent/extensions/nemotron-tool-repair/README.md
index 69fcb27..ff06401 100644
--- a/pi/agent/extensions/nemotron-tool-repair/README.md
+++ b/pi/agent/extensions/nemotron-tool-repair/README.md
@@ -24,14 +24,7 @@ same model IDs, but they do not go through the Nemotron repair path.
## Usage Flow
-Start Pi the same way as before:
-
-```bash
-cd /home/paul/git/conf/snippets/hyperstack
-./pi-vm1
-```
-
-or explicitly:
+Start Pi with the Nemotron model:
```bash
pi --model 'hyperstack1/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit'
diff --git a/pi/plans/gt-plan.md b/pi/plans/gt-plan.md
deleted file mode 100644
index 7cf5a38..0000000
--- a/pi/plans/gt-plan.md
+++ /dev/null
@@ -1,29 +0,0 @@
-# Project gt – Gap Analysis and Improvement Plan
-
-## Overall Picture & Goals
-
-- Provide a reliable, well‑documented command‑line percentage calculator with RPN and rational number support.
-- Deliver a smooth developer experience: clear contribution guidelines, automated CI/CD, and proper versioning.
-- Ensure the codebase follows Go best practices, has comprehensive tests, and ships a stable binary.
-
-Plan:
-
-1. **Fix CI build step** – Update GitHub Actions workflow to build the correct binary path (`./cmd/gt` instead of `./cmd/perc`).
-2. **Update `go.mod` Go version** – Change the `go` directive to a supported version (e.g. `go 1.22`) to match the CI Go version.
-3. **Add `CONTRIBUTING.md`** – Provide guidelines for building, testing, using `mage`, and submitting pull requests.
-4. **Expand README** – Include concrete examples for rational‑mode (`rat on/off/toggle`) and hyper‑operators (`[+]`, `[*]`, etc.).
-5. **Add badges to README** – CI status, test coverage, and Go Report Card badges.
-6. **Add end‑to‑end CLI tests** – Test the built binary for commands like `gt version`, `gt 20% of 150`, and `gt help`.
-7. **Add REPL command tests** – Cover built‑in commands (`help`, `clear`, `quit`, `rat`) and variable management (`vars`, `clear`, `name d`).
-8. **Add `.goreleaser.yml`** – Set up automated release builds and GitHub releases.
-9. **Implement version bump workflow** – Use the `increment-version-and-push` skill to bump the version, tag, and push.
-10. **Document variable management** – Add a dedicated README section describing `vars`, `clear`, and variable deletion commands.
-11. **Update Magefile** – Add shortcuts for build, test, lint, and release.
-12. **Add missing Go documentation** – Ensure all exported functions in the REPL package (`NewREPL`, `RunREPL`, `executor`, `defaultExecutor`, `defaultCompleter`, `defaultGetCommandDescription`) have godoc comments.
-13. **Add go vet step to CI workflow** – Include a `go vet ./...` step in the GitHub Actions CI configuration to catch static analysis issues.
-14. **Add godoc comments for exported TTYChecker methods (IsTTY, EnsureTTY)**.
-15. **Add godoc comments for exported SignalHandler methods (Start, Stop)**.
-16. **Add SPDX license headers to all .go source files**.
-17. **Wrap errors with %w where appropriate for better error chaining**.
-18. **Design a nice logo for the gt project (e.g., stylized 'gt' with calculator motif)**.
-