diff options
| -rw-r--r-- | PLAN-L40.md | 157 | ||||
| -rw-r--r-- | README.md | 13 | ||||
| -rw-r--r-- | hypr.fish | 2 | ||||
| -rw-r--r-- | lib/hyperstack/cli.rb | 5 | ||||
| -rw-r--r-- | lib/hyperstack/manager.rb | 50 | ||||
| -rw-r--r-- | lib/hyperstack/provisioning_orchestrator.rb | 27 | ||||
| -rw-r--r-- | lib/hyperstack/ssh_runner.rb | 1 | ||||
| -rw-r--r-- | lib/hyperstack/vm_lifecycle.rb | 150 | ||||
| -rw-r--r-- | pi/agent/extensions/fresh-subagent/README.md | 6 | ||||
| -rw-r--r-- | pi/agent/extensions/loop-scheduler/loop-presets.md | 4 | ||||
| -rw-r--r-- | pi/agent/extensions/nemotron-tool-repair/README.md | 9 | ||||
| -rw-r--r-- | pi/plans/gt-plan.md | 29 |
12 files changed, 185 insertions, 268 deletions
diff --git a/PLAN-L40.md b/PLAN-L40.md deleted file mode 100644 index 3d0b1ff..0000000 --- a/PLAN-L40.md +++ /dev/null @@ -1,157 +0,0 @@ -# Plan: VM1 on Hyperstack L40 with Qwen3.6 MoE + TurboQuant - -**Prepared:** 2026-05-24 -**Scope:** Research and planning only — no code changes, no provisioning. - ---- - -## 1. GPU and VM sizing (Hyperstack L40) - -| Item | Assessment | -|---|---| -| **Flavor** | Hyperstack’s GPU flavors use the `n3-*` prefix (see current `n3-A100x1` / `n3-H100x1`). The L40 48 GB flavor is expected to be named `n3-L40x1` or `n3-L40Sx1`; exact string must be verified via the Hyperstack console/API before updating `hyperstack-vm1.toml`. | -| **VRAM** | 48 GB (vs 80 GB on the current A100). That is a hard ceiling for both model weights and KV cache. | -| **Cost** | L40/L40S nodes are generally cheaper than A100/H100 on Hyperstack. Assuming the tiered pricing model, an L40 should reduce the hourly cost of VM1, but the final price depends on the exact `flavor_name` and any egress charges. | - -## 2. Model choice: what actually fits on 48 GB - -The prompt mentions **Qwen3.6 MoE (e.g. 235B-A22B)**. A 235B-parameter model in BF16 would require **> 400 GB** of VRAM, which is impossible on a single L40. The only Qwen3.6 MoE that is publicly released and could *potentially* fit is **Qwen3.6-35B-A3B** (35B total / 3B active), but even that is **~70 GB in BF16**. - -**Realistic options to make it fit in 48 GB:** - -| Option | Weight size (est.) | Fit on 48 GB? | Notes | -|---|---|---|---| -| **AWQ 4-bit** Qwen3.6-35B-A3B | ~18 GB | Yes | Needs a community or official AWQ checkpoint (not yet listed as official at the time of writing, but AWQ/GPTQ variants usually appear quickly). | -| **FP8** Qwen3.6-35B-A3B (if available) | ~35 GB | Tight | Leaves ~10 GB for KV cache, activations and CUDA graphs. vLLM profiling may tip it over. | -| **Qwen3.6-27B dense** (current VM2 default) | ~27 GB FP8 | Yes | Not MoE; defeats the purpose of the task. | - -**Recommendation:** Target an **AWQ 4-bit (or GPTQ 4-bit) Qwen3.6-35B-A3B** checkpoint, or wait for an official **FP8** checkpoint and accept a reduced `max_model_len`. Do not attempt the 235B-A22B variant on a single L40. - -## 3. vLLM + TurboQuant compatibility - -TurboQuant is a KV-cache compression backend in vLLM. Key upstream state: - -- **PR #39931** (merged 2026-05-05) added TurboQuant support for *hybrid* architectures (attention + Mamba/MoE). -- **Issue #41726** reports a fatal crash during **chunked continuation prefill** on hybrid MoE models (e.g. Qwen3.5-9B NVFP4). Root cause: TurboQuant’s `_continuation_prefill` path requests workspace memory that was not reserved during warmup. -- **PR #40798** is open as a candidate fix but **not yet merged**. - -**Implications for Qwen3.6-35B-A3B:** -- Because Qwen3.6 uses a hybrid attention+Mamba architecture, it is in the exact class of models affected by #41726. -- If TurboQuant is enabled (`--kv-cache-dtype turboquant_k8v4`, `--kv-cache-dtype turboquant_4bit_nc`, etc.), any long prompt that crosses a chunked-prefill boundary will likely trigger: - ``` - AssertionError: Workspace is locked but allocation ... requires X MB, current size is Y MB. - ``` - -**Mitigations available today:** -1. **Disable chunked prefill:** Pass `--no-enable-chunked-prefill` in `extra_vllm_args`. This avoids the `_continuation_prefill` path entirely. Trade-off: large prefills are no longer split into chunks, which can increase latency for long inputs and may OOM if a single prefill is very large. -2. **Use `--enforce-eager`:** Disables CUDA graph capture, which slightly changes memory layout but does **not** solve the workspace lock issue by itself. It is useful mainly to save a few GB of VRAM on tight GPUs. -3. **Wait for PR #40798** to merge and land in a stable vLLM image. - -## 4. Recommended `hyperstack-vm1.toml` changes (conceptual) - -```toml -[vm] -# Verify exact flavor string with Hyperstack API before deploying. -flavor_name = "n3-L40x1" # or n3-L40Sx1 -labels = ["qwen36-moe", "wireguard"] - -[vllm] -install = true -model = "Qwen/Qwen3.6-35B-A3B-AWQ" # or the best available quantized MoE -container_name = "vllm_qwen36_moe" -max_model_len = 65536 # conservative for 48 GB; can raise if AWQ -gpu_memory_utilization = 0.92 -tensor_parallel_size = 1 -tool_call_parser = "qwen3_coder" - -# TurboQuant KV cache on a hybrid MoE -extra_vllm_args = [ - "--reasoning-parser", "qwen3", - "--kv-cache-dtype", "turboquant_k8v4", - "--no-enable-chunked-prefill" # mitigation for issue #41726 -] - -# Nightly image post-PR-39931 is required; pin to a known-good digest until 0.20.2+ -docker_image = "vllm/vllm-openai:nightly" -``` - -**VRAM estimate (AWQ 4-bit + TurboQuant K8V4 on L40 48 GB):** - -| Consumer | Est. size | -|---|---| -| AWQ weights (35B params @ 4-bit) | ~18 GB | -| Activations / MoE routing / logits | ~4–6 GB | -| CUDA graphs (if not eager) | ~2 GB | -| KV cache (TurboQuant) | ~20–24 GB | -| **Headroom** | **~0–4 GB** | - -Because headroom is thin, `gpu_memory_utilization=0.92` is appropriate. If profiling OOMs, raise it to `0.95` or drop `max_model_len`. If vLLM still OOMs during startup, try `--enforce-eager` to reclaim the CUDA-graph memory. - -## 5. CLI and WireGuard implications - -| Area | Impact | -|---|---| -| `--vm 1 / 2 / both` | No structural changes. The CLI already resolves `hyperstack-vm1.toml` independently via its own state file. Switching the flavor/model is transparent to `--vm 2`. | -| WireGuard | `wireguard_server_ip = "192.168.3.1"` stays the same. Recreating VM1 yields a new public IP, so the local `wg1.conf` peer endpoint must be refreshed (`ruby hyperstack.rb --vm 1 create` already handles this via `wg1-setup.sh`). The tunnel subnet `192.168.3.0/24` is unchanged. | -| Port 11434 / firewall | Unchanged. Port 56710 UDP and 22 TCP remain locked to `allowed_wireguard_cidrs` / `allowed_ssh_cidrs`. | -| Dual-VM routing | The client can continue to round-robin or fallback between `192.168.3.1` (VM1, MoE) and `192.168.3.3` (VM2, dense). No code changes needed. | - -## 6. Risks - -| Risk | Severity | Mitigation | -|---|---|---| -| **TurboQuant crash (#41726)** on hybrid MoE | High | Disable chunked prefill now; migrate to fixed vLLM nightly once PR #40798 lands. | -| **Model does not fit** in 48 GB if no AWQ/FP8 checkpoint exists | High | Confirm a 4-bit or FP8 checkpoint is on HuggingFace before provisioning. Fallback to Qwen3.6-27B dense (moves goalposts). | -| **Performance regression** from no chunked prefill | Medium | Expect higher TTFB on long prompts. Monitor with `ruby hyperstack.rb --vm 1 test`. | -| **Flavor unavailability** | Medium | Have a fallback flavor ready (e.g. `n3-A100x1` on VM1 if L40 is sold out), or accept A100 pricing. | -| **Nightly Docker image instability** | Medium | Pin to a specific digest (`vllm/vllm-openai@sha256:...`) after first successful smoke test. | - -## 7. Step-by-step migration plan (if you decide to proceed) - -1. **Verify asset availability** - - Confirm Hyperstack offers an L40 flavor and note its exact name. - - Locate a Qwen3.6-35B-A3B AWQ/FP8 checkpoint on HuggingFace. If none exists, abort or pivot to the dense 27B. - -2. **Snapshot / backup** - - Ensure VM2 (A100 dense) is stable and passing tests (`ruby hyperstack.rb --vm 2 test`). - - Save current VM1 state file as `.hyperstack-vm1-state.json.bak` in case a fast rollback is needed. - -3. **Update configuration** - - Edit `hyperstack-vm1.toml`: - - `flavor_name` → L40 flavor. - - `[vllm]` block → new model ID, container name, conservative `max_model_len`. - - Add `docker_image = "vllm/vllm-openai:nightly"` (or a pinned digest). - - Add TurboQuant arg and chunked-prefill mitigation to `extra_vllm_args`. - - Update `[vm] labels` to reflect the new model. - -4. **Provision** - ```bash - ruby hyperstack.rb --vm 1 create --replace - ``` - The `--replace` flag tears down the old A100 VM1 and rebuilds it on L40. - -5. **Post-create validation** - - Check WireGuard handshake: `sudo wg show wg1 latest-handshakes`. - - Ping tunnel IP: `ping -c 3 192.168.3.1`. - - Query vLLM: `curl -s http://192.168.3.1:11434/v1/models`. - - Run the automated test suite: `ruby hyperstack.rb --vm 1 test`. - -6. **Smoke test for TurboQuant stability** - - Send a conversation with a very long system prompt (> 4096 tokens) and tool schemas to force a chunked-prefill boundary. - - If the engine crashes with the workspace assertion, apply the fallback: - - Add `--enforce-eager` to `extra_vllm_args`, or - - Fall back to `--kv-cache-dtype fp8` (loses TurboQuant compression but is stable). - -7. **Dual-VM confirmation** - - Run `ruby hyperstack.rb --vm both test` to ensure both endpoints are healthy and reachable through the WireGuard tunnel. - -8. **Monitor and iterate** - - Watch VRAM usage with `nvidia-smi` inside the VM. - - Adjust `max_model_len` and `gpu_memory_utilization` as needed. - - Once upstream PR #40798 merges, rebuild the Docker image with the fixed vLLM version and re-enable chunked prefill. - ---- - -## Bottom line - -The L40 is a cost-efficient target *if* a quantized Qwen3.6-35B-A3B checkpoint is available. The biggest blocker is the open vLLM issue #41726 (TurboQuant + hybrid MoE crash on chunked prefill). Disabling chunked prefill is a viable short-term workaround, but it comes with a latency trade-off and must be validated before making VM1 the default endpoint. @@ -154,7 +154,7 @@ Each Hyperstack VM runs a vLLM instance; Pi connects to it directly over the Wir Install Pi from [pi.dev](https://pi.dev), then link the project-local config into place: ```bash -ln -s /path/to/hyperstack/pi ~/.pi +ln -s /path/to/hypr/pi ~/.pi ``` This symlink makes Pi pick up `pi/agent/models.json` and `pi/agent/settings.json` @@ -163,11 +163,11 @@ definitions are available without any manual config editing. ### Fish shell abbreviations -Source `hyperstack.fish` or copy the abbreviations into your Fish config: +Source `hypr.fish` or copy the abbreviations into your Fish config: ```fish abbr pi-hyperstack pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8 -abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8 +abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8 abbr pi-hyperstack-qwen36 pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8 abbr pi-hyperstack-gemma4 pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit ``` @@ -176,7 +176,7 @@ Then launch a session after the VM(s) are up: ```fish pi-hyperstack # Qwen3.6 27B FP8 on VM1 -pi-hyperstack-coder # Qwen3.6 27B FP8 on VM1 +pi-hyperstack-coder # Qwen3.6 27B FP8 on VM1 pi-hyperstack-qwen36 # Qwen3.6 27B FP8 on VM2 pi-hyperstack-gemma4 # Gemma 4 31B on VM2 ``` @@ -280,10 +280,8 @@ Available presets (both VMs share the same set): |---|---|---|---| | `gemma4-31b` | Gemma 4 31B IT (AWQ-4bit) | ~19 GB | 32K–128K (see TOML) | | `nemotron-super` | Nemotron-3-Super 120B (Mamba+MoE, 12B active) | ~60 GB | 131K | -| `qwen36-35b-a3b` | Qwen3.6-35B-A3B MoE (AWQ, 3B active) | ~18 GB | 65K* | +| `qwen36-35b-a3b` | Qwen3.6-35B-A3B MoE (AWQ, 3B active) | ~18 GB | 65K* (needs a quantized checkpoint) | | `qwen36-27b` | Qwen3.6 27B FP8 | ~45 GB | 262K | - -\* Needs a quantized checkpoint on HuggingFace before it can run on a single GPU. | `qwen25-coder-32b` | Qwen2.5-Coder-32B-Instruct (AWQ) | ~18 GB | 32K | | `qwen3-coder-30b` | Qwen3-Coder-30B-A3B (MoE, AWQ) | ~18 GB | 65K | | `deepseek-r1-32b` | DeepSeek-R1-Distill-Qwen-32B (AWQ) | ~18 GB | 32K | @@ -317,7 +315,6 @@ All commands accept --vm 1|2|both (default: 1). ## Configuration Edit `hyperstack-vm1.toml` / `hyperstack-vm2.toml`. -Use `hyperstack-vm1-nemotron.toml` for a dual-H100 Nemotron-3-Super profile on the VM1 slot (same state file as `hyperstack-vm1.toml` — use one or the other). Key sections: | Section | Purpose | @@ -3,7 +3,7 @@ abbr pi-hyperstack pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8 abbr pi-hyperstack-coder pi --model hyperstack1/Qwen/Qwen3.6-27B-FP8 abbr pi-hyperstack-qwen36 pi --model hyperstack2/Qwen/Qwen3.6-27B-FP8 abbr pi-hyperstack-gemma4 pi --model hyperstack2/cyankiwi/gemma-4-31B-it-AWQ-4bit -abbr hyperstack-create ruby ~/git/hyperstack/hyperstack.rb create +abbr hyperstack-create ruby ~/git/hypr/hyperstack.rb create # Ollama (local endpoint pointing at cloud models) abbr pi-ollama-kimi pi --provider ollama --model kimi-k2.6:cloud diff --git a/lib/hyperstack/cli.rb b/lib/hyperstack/cli.rb index 76f158e..b5bcaff 100644 --- a/lib/hyperstack/cli.rb +++ b/lib/hyperstack/cli.rb @@ -1,5 +1,6 @@ # frozen_string_literal: true +require 'json' require 'optparse' require 'socket' @@ -383,9 +384,9 @@ module HyperstackVM hostnames = loaders.map { |loader| loader.config.wireguard_gateway_hostname } begin local_manager = build_manager(loaders.first.config, out: local_wg_out) - cleanup = local_manager.send(:cleanup_local_access, dry_run: dry_run, hostnames: hostnames, + cleanup = local_manager.cleanup_local_access(dry_run: dry_run, hostnames: hostnames, allowed_ips: allowed_ips) - local_manager.send(:report_local_cleanup, local_wg_out, cleanup, dry_run: dry_run) + local_manager.report_local_cleanup(local_wg_out, cleanup, dry_run: dry_run) rescue Error => e errors[:local_wireguard] = e.message end diff --git a/lib/hyperstack/manager.rb b/lib/hyperstack/manager.rb index 2150554..cecf11d 100644 --- a/lib/hyperstack/manager.rb +++ b/lib/hyperstack/manager.rb @@ -1,5 +1,6 @@ # frozen_string_literal: true +require_relative 'provisioning' require_relative 'ssh_runner' require_relative 'vm_lifecycle' require_relative 'wireguard_setup' @@ -69,26 +70,19 @@ module HyperstackVM def create(replace: false, dry_run: false, install_vllm: nil, install_ollama: nil, flavor_name: nil, vllm_preset: nil) - raise Error, "DRY RUN is not supported." if dry_run - - if replace - existing = @state_store.load - if existing && existing['vm_id'] - @vm_lifecycle.delete(vm_id: existing['vm_id']) - end - end - install_vllm = @config.vllm_install_enabled? if install_vllm.nil? install_ollama = @config.ollama_install_enabled? if install_ollama.nil? state = @vm_lifecycle.create( + replace: replace, + dry_run: dry_run, flavor_name: flavor_name, vllm_preset: vllm_preset, install_vllm: install_vllm, install_ollama: install_ollama - ) do |s| - @local_wireguard.show_local_wireguard(s['public_ip']) - end + ) { |s| show_local_wireguard([s['public_ip']].compact) } + + return if state.nil? @orchestrator.run( state, @@ -112,7 +106,7 @@ module HyperstackVM def status(include_local_wireguard: true) ip = @vm_lifecycle.status - @local_wireguard.show_local_wireguard(ip) if include_local_wireguard + show_local_wireguard([ip].compact) if include_local_wireguard ip end @@ -132,5 +126,35 @@ module HyperstackVM def list_models @vm_lifecycle.list_models end + + def cleanup_local_access(dry_run:, hostnames:, allowed_ips:) + peers = @local_wireguard.remove_peers_by_allowed_ips(allowed_ips, dry_run: dry_run) + removed_hosts = @local_wireguard.remove_hostnames(hostnames, dry_run: dry_run) + { peers: peers, hostnames: removed_hosts } + end + + def report_local_cleanup(output, cleanup, dry_run:) + peer_summary = cleanup[:peers].map { |peer| peer['AllowedIPs'] || peer['Endpoint'] }.join(', ') + host_summary = cleanup[:hostnames].join(', ') + + if dry_run + if cleanup[:peers].empty? && cleanup[:hostnames].empty? + output.puts('DRY RUN: no matching local WireGuard peers or host entries would be removed.') + return + end + unless cleanup[:peers].empty? + output.puts("DRY RUN: local WireGuard peers would be removed for #{peer_summary}.") + end + unless cleanup[:hostnames].empty? + output.puts("DRY RUN: local host entries would be removed for #{host_summary}.") + end + return + end + + output.puts('No matching local WireGuard peers needed removal.') if cleanup[:peers].empty? + output.puts('No matching local host entries needed removal.') if cleanup[:hostnames].empty? + output.puts("Local WireGuard peers removed for #{peer_summary}.") unless cleanup[:peers].empty? + output.puts("Local host entries removed for #{host_summary}.") unless cleanup[:hostnames].empty? + end end end diff --git a/lib/hyperstack/provisioning_orchestrator.rb b/lib/hyperstack/provisioning_orchestrator.rb index f3222d9..8abfec8 100644 --- a/lib/hyperstack/provisioning_orchestrator.rb +++ b/lib/hyperstack/provisioning_orchestrator.rb @@ -75,7 +75,6 @@ module HyperstackVM @state_store.save(state) info "VM ready: #{state['public_ip']} (id=#{state['vm_id']})" - @inference_tester.config.show_local_wireguard(state['public_ip']) rescue nil @inference_tester.test(state) state end @@ -106,6 +105,18 @@ module HyperstackVM info "Adding Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..." @client.create_vm_rule(vm['id'], rule) end + + legacy_litellm_rules(existing).each do |rule| + rule_id = rule['id'] || rule['rule_id'] + unless rule_id + warn_out 'Found legacy Hyperstack firewall rule for port 4000, but the API payload has no rule id; remove it manually from the Hyperstack console.' + next + end + info "Removing legacy Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..." + @client.delete_vm_rule(vm['id'], rule_id) + rescue Error => e + warn_out "Failed to remove legacy Hyperstack firewall rule #{rule_id}: #{e.message}" + end end def effective_ollama? @@ -165,6 +176,16 @@ module HyperstackVM %w[ACTIVE SHUTOFF HIBERNATED].include?(vm['status'].to_s.upcase) end + def legacy_litellm_rules(rules) + Array(rules).select do |rule| + normalized = normalize_rule(rule) + normalized['protocol'] == 'tcp' && + normalized['port_range_min'] == 4000 && + normalized['port_range_max'] == 4000 && + normalized['remote_ip_prefix'] == @config.wireguard_subnet + end + end + private def with_polling(description, timeout: 900, interval: 5) @@ -183,5 +204,9 @@ module HyperstackVM def info(msg) @out.puts(msg) end + + def warn_out(msg) + @out.puts("WARN: #{msg}") + end end end diff --git a/lib/hyperstack/ssh_runner.rb b/lib/hyperstack/ssh_runner.rb index f41859d..e4440b6 100644 --- a/lib/hyperstack/ssh_runner.rb +++ b/lib/hyperstack/ssh_runner.rb @@ -1,5 +1,6 @@ # frozen_string_literal: true +require 'fileutils' require 'open3' require 'socket' diff --git a/lib/hyperstack/vm_lifecycle.rb b/lib/hyperstack/vm_lifecycle.rb index 972c896..cc52880 100644 --- a/lib/hyperstack/vm_lifecycle.rb +++ b/lib/hyperstack/vm_lifecycle.rb @@ -1,5 +1,8 @@ # frozen_string_literal: true +require 'json' +require_relative 'provisioning' + module HyperstackVM # Orchestrates the VM lifecycle from creation through deletion. class VmLifecycle @@ -9,27 +12,52 @@ module HyperstackVM @state_store = state_store @local_wireguard = local_wireguard @out = out + @scripts = ProvisioningScripts.new(config: config) end attr_reader :config, :client, :state_store - def create(flavor_name: nil, vllm_preset: nil, install_vllm: nil, install_ollama: nil, &block) + def create(replace: false, dry_run: false, flavor_name: nil, vllm_preset: nil, + install_vllm: nil, install_ollama: nil, &block) @effective_flavor_name = flavor_name.nil? ? @config.flavor_name : flavor_name @state_store.load if defined?(@state_store) # force load existing = @state_store.load if existing && existing['vm_id'] - raise Error, - "State file #{@state_store.path} already tracks VM #{existing['vm_id']}. Use --replace or delete first." + if replace + if dry_run + info "DRY RUN: would delete tracked VM #{existing['vm_id']} before creating a replacement." + show_local_wireguard([]) + return nil + else + delete(vm_id: existing['vm_id']) + end + elsif resumable_state?(existing) + if dry_run + print_resume_dry_run(existing, install_vllm: install_vllm, install_ollama: install_ollama, vllm_preset: vllm_preset) + return nil + end + info "Resuming tracked VM #{existing['vm_id']} provisioning..." + return existing + else + raise Error, + "State file #{@state_store.path} already tracks VM #{existing['vm_id']}. Use --replace or delete first." + end end resolved = resolve_dependencies vm_name = @config.generated_vm_name - info "Creating VM #{vm_name} in #{resolved[:environment]['name']} using #{@effective_flavor_name}..." + info (dry_run ? "Planning" : "Creating") + " VM #{vm_name} in #{resolved[:environment]['name']} using #{@effective_flavor_name}..." payload = build_payload(vm_name, resolved, install_vllm: install_vllm, install_ollama: install_ollama) + if dry_run + print_create_dry_run(vm_name, resolved, payload, install_vllm: install_vllm, install_ollama: install_ollama, vllm_preset: vllm_preset) + show_local_wireguard([]) + return nil + end + response = @client.create_vm(payload) instance = Array(response['instances']).first - raise Error, 'Hyperstack create response did not include an instance ID.' unless instance&&['id'] + raise Error, 'Hyperstack create response did not include an instance ID.' unless instance && instance['id'] state = build_state(vm_name, instance, resolved) sync_service_mode(state, install_vllm: install_vllm, install_ollama: install_ollama) @@ -87,20 +115,23 @@ module HyperstackVM info "Missing firewall rules: #{missing.empty? ? 'none' : missing.size}" rescue Error => e warn_out "Unable to load VM #{state['vm_id']}: #{e.message}" + return state&.dig('public_ip') end connect_host_for(vm) end - def resolve_dependencies + def resolve_dependencies(flavor_name: nil) + flavor_name = @effective_flavor_name if flavor_name.nil? && @effective_flavor_name + flavor_name = @config.flavor_name if flavor_name.nil? environment = @client.list_environments.find { |item| item['name'] == @config.environment_name } raise Error, "Environment #{@config.environment_name.inspect} was not found in Hyperstack." unless environment flavor = @client.list_flavors.find do |item| - item['name'] == @effective_flavor_name && item['region_name'] == environment['region'] + item['name'] == flavor_name && item['region_name'] == environment['region'] end - raise Error, "Flavor #{@effective_flavor_name.inspect} is not available in #{environment['region']}." unless flavor + raise Error, "Flavor #{flavor_name.inspect} is not available in #{environment['region']}." unless flavor if flavor['stock_available'] == false - raise Error, "Flavor #{@effective_flavor_name.inspect} exists in #{environment['region']} but is out of stock." + raise Error, "Flavor #{flavor_name.inspect} exists in #{environment['region']} but is out of stock." end image = @client.list_images.find do |item| @@ -146,29 +177,6 @@ module HyperstackVM end end - def ensure_security_rules(vm) - existing = Array(vm['security_rules']) - existing_norm = existing.map { |r| normalize_rule(r) } - desired = desired_rules.map { |r| normalize_rule(r) } - - (desired - existing_norm).each do |rule| - info "Adding Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..." - @client.create_vm_rule(vm['id'], rule) - end - - legacy_litellm(existing).each do |rule| - rule_id = rule['id'] || rule['rule_id'] - unless rule_id - warn_out 'Found legacy Hyperstack firewall rule for port 4000, but the API payload has no rule id; remove it manually from the Hyperstack console.' - next - end - info "Removing legacy Hyperstack firewall rule #{rule['protocol']} #{rule['remote_ip_prefix']} #{rule['port_range_min']}..." - @client.delete_vm_rule(vm['id'], rule_id) - rescue Error => e - warn_out "Failed to remove legacy Hyperstack firewall rule #{rule_id}: #{e.message}" - end - end - def connect_host_for(vm) return vm['floating_ip'] if @config.assign_floating_ip? vm['floating_ip'] || vm['fixed_ip'] @@ -246,6 +254,74 @@ module HyperstackVM private + def resumable_state?(state) + state && state['vm_id'] && state['provisioned_at'].nil? + end + + def print_create_dry_run(vm_name, resolved, payload, install_vllm:, install_ollama:, vllm_preset:) + info 'DRY RUN: no VM or state file will be created.' + info "State file: #{@state_store.path}" + info "Resolved environment: #{resolved[:environment]['name']} (region #{resolved[:environment]['region']})" + info "Resolved flavor: #{format_flavor(resolved[:flavor])}" + info "Resolved image: #{resolved[:image]['name']}" + info "Resolved SSH keypair: #{resolved[:keypair]['name']}" + info "Planned VM name: #{vm_name}" + info "Allowed SSH CIDRs: #{@config.allowed_ssh_cidrs.join(', ')}" + info "Allowed WireGuard CIDRs: #{@config.allowed_wireguard_cidrs.join(', ')}" + info 'Create payload:' + @out.puts(JSON.pretty_generate(payload)) + if @config.guest_bootstrap_enabled? + info 'Guest bootstrap script:' + @out.puts(@scripts.guest_bootstrap_script) + else + info 'Guest bootstrap is disabled in config.' + end + if install_ollama + info "Ollama will be installed with models stored under #{@config.ollama_models_dir}" + models = @scripts.desired_ollama_models + info "Ollama models to pre-pull: #{models.join(', ')}" unless models.empty? + end + if install_vllm + preset_cfg = vllm_preset ? @config.vllm_preset(vllm_preset) : nil + vllm_m = preset_cfg&.dig('model') || @config.vllm_model + vllm_cname = preset_cfg&.dig('container_name') || @config.vllm_container_name + vllm_maxlen = preset_cfg&.dig('max_model_len') || @config.vllm_max_model_len + preset_note = vllm_preset ? " (preset: #{vllm_preset})" : '' + info "vLLM will be installed: #{vllm_m}#{preset_note}" + info " Container: #{vllm_cname}, port #{@config.ollama_port}, max_model_len #{vllm_maxlen}" + end + if @config.wireguard_auto_setup? + info "WireGuard auto-setup script: #{@config.wireguard_setup_script} <vm_public_ip>" + end + end + + def print_resume_dry_run(state, install_vllm:, install_ollama:, vllm_preset:) + info "DRY RUN: would resume provisioning tracked VM #{state['vm_id']}." + begin + vm = @client.get_vm(state['vm_id']) + info "Tracked VM status: #{vm['status']} / #{vm['vm_state']}" + ip = vm['floating_ip'] || vm['fixed_ip'] + info "Tracked VM public IP: #{ip || 'none'}" + rescue Error => e + warn_out "Unable to inspect tracked VM #{state['vm_id']}: #{e.message}" + end + if @config.guest_bootstrap_enabled? && state['bootstrapped_at'].nil? + info 'Guest bootstrap script:' + @out.puts(@scripts.guest_bootstrap_script) + end + if install_ollama && state['ollama_installed_at'].nil? + info "Ollama would be installed with models stored under #{@config.ollama_models_dir}" + models = @scripts.desired_ollama_models + info "Ollama models to pre-pull: #{models.join(', ')}" unless models.empty? + end + if install_vllm && state['vllm_setup_at'].nil? + info "vLLM would be installed: #{state['vllm_model'] || @config.vllm_model}" + end + if @config.wireguard_auto_setup? && state['wireguard_setup_at'].nil? + info "WireGuard auto-setup script would run: #{@config.wireguard_setup_script} #{state['public_ip'] || '<pending-public_ip>'}" + end + end + def build_payload(vm_name, resolved, install_vllm: nil, install_ollama: nil) payload = { 'name' => vm_name, @@ -306,16 +382,6 @@ module HyperstackVM parts.empty? ? 'All inference services disabled' : "#{parts.join(', ')} enabled" end - def legacy_litellm(rules) - Array(rules).select do |rule| - normalized = normalize_rule(rule) - normalized['protocol'] == 'tcp' && - normalized['port_range_min'] == 4000 && - normalized['port_range_max'] == 4000 && - normalized['remote_ip_prefix'] == @config.wireguard_subnet - end - end - def perform_local_cleanup(dry_run:) peers = @local_wireguard.remove_peers_by_allowed_ips( ["#{@config.wireguard_gateway_ip}/32"], dry_run: dry_run diff --git a/pi/agent/extensions/fresh-subagent/README.md b/pi/agent/extensions/fresh-subagent/README.md index ac2cde2..26630f0 100644 --- a/pi/agent/extensions/fresh-subagent/README.md +++ b/pi/agent/extensions/fresh-subagent/README.md @@ -141,11 +141,7 @@ In one-shot or print mode it runs the editor command directly. Alias with the same watched behavior: ```text -/subagent-watch <prompt - -> Unknown command "/subagent-watch <prompt". Try /help?> - - +/subagent-watch <prompt> ``` Launch a visible fresh Pi session instead of a headless child: diff --git a/pi/agent/extensions/loop-scheduler/loop-presets.md b/pi/agent/extensions/loop-scheduler/loop-presets.md index f90cad3..1df5b22 100644 --- a/pi/agent/extensions/loop-scheduler/loop-presets.md +++ b/pi/agent/extensions/loop-scheduler/loop-presets.md @@ -8,6 +8,6 @@ # * monitor: 10m check if there are any errors in the logs * tasks: 1m automatically start with the next task with fresh context if the current task completed following the agent-task-management skill. -* proceed: 1m proceed with the next task following agent-task-management if the previous or currently tasks being worked on is completed and committed to git. -* review: 1m review all code changes since the last review and add code review comments using agent-task-management skill. use go-bestpractices and SOLID skills. +* proceed: 1m proceed with the next task following agent-task-management if the previous or current task being worked on is completed and committed to git. +* review: 1m review all code changes since the last review and add code review comments using agent-task-management skill. use go-best-practices and solid-principles skills. * scifi: 1m write a scifi story about the current project or continue writing the story into STORY.md. diff --git a/pi/agent/extensions/nemotron-tool-repair/README.md b/pi/agent/extensions/nemotron-tool-repair/README.md index 69fcb27..ff06401 100644 --- a/pi/agent/extensions/nemotron-tool-repair/README.md +++ b/pi/agent/extensions/nemotron-tool-repair/README.md @@ -24,14 +24,7 @@ same model IDs, but they do not go through the Nemotron repair path. ## Usage Flow -Start Pi the same way as before: - -```bash -cd /home/paul/git/conf/snippets/hyperstack -./pi-vm1 -``` - -or explicitly: +Start Pi with the Nemotron model: ```bash pi --model 'hyperstack1/cyankiwi/NVIDIA-Nemotron-3-Super-120B-A12B-AWQ-4bit' diff --git a/pi/plans/gt-plan.md b/pi/plans/gt-plan.md deleted file mode 100644 index 7cf5a38..0000000 --- a/pi/plans/gt-plan.md +++ /dev/null @@ -1,29 +0,0 @@ -# Project gt – Gap Analysis and Improvement Plan - -## Overall Picture & Goals - -- Provide a reliable, well‑documented command‑line percentage calculator with RPN and rational number support. -- Deliver a smooth developer experience: clear contribution guidelines, automated CI/CD, and proper versioning. -- Ensure the codebase follows Go best practices, has comprehensive tests, and ships a stable binary. - -Plan: - -1. **Fix CI build step** – Update GitHub Actions workflow to build the correct binary path (`./cmd/gt` instead of `./cmd/perc`). -2. **Update `go.mod` Go version** – Change the `go` directive to a supported version (e.g. `go 1.22`) to match the CI Go version. -3. **Add `CONTRIBUTING.md`** – Provide guidelines for building, testing, using `mage`, and submitting pull requests. -4. **Expand README** – Include concrete examples for rational‑mode (`rat on/off/toggle`) and hyper‑operators (`[+]`, `[*]`, etc.). -5. **Add badges to README** – CI status, test coverage, and Go Report Card badges. -6. **Add end‑to‑end CLI tests** – Test the built binary for commands like `gt version`, `gt 20% of 150`, and `gt help`. -7. **Add REPL command tests** – Cover built‑in commands (`help`, `clear`, `quit`, `rat`) and variable management (`vars`, `clear`, `name d`). -8. **Add `.goreleaser.yml`** – Set up automated release builds and GitHub releases. -9. **Implement version bump workflow** – Use the `increment-version-and-push` skill to bump the version, tag, and push. -10. **Document variable management** – Add a dedicated README section describing `vars`, `clear`, and variable deletion commands. -11. **Update Magefile** – Add shortcuts for build, test, lint, and release. -12. **Add missing Go documentation** – Ensure all exported functions in the REPL package (`NewREPL`, `RunREPL`, `executor`, `defaultExecutor`, `defaultCompleter`, `defaultGetCommandDescription`) have godoc comments. -13. **Add go vet step to CI workflow** – Include a `go vet ./...` step in the GitHub Actions CI configuration to catch static analysis issues. -14. **Add godoc comments for exported TTYChecker methods (IsTTY, EnsureTTY)**. -15. **Add godoc comments for exported SignalHandler methods (Start, Stop)**. -16. **Add SPDX license headers to all .go source files**. -17. **Wrap errors with %w where appropriate for better error chaining**. -18. **Design a nice logo for the gt project (e.g., stylized 'gt' with calculator motif)**. - |
