| Age | Commit message (Collapse) | Author |
|
vLLM 0.17.1 has no tool call parser for Nemotron's custom XML format
(<tool_call><function=...><parameter=...>). Setting llama3_json produced
garbage output. Reverted to tool_call_parser="" with a clear comment.
Added --reasoning-parser nemotron_v3 via extra_vllm_args so <think> tokens
are properly exposed as reasoning_content in the API response.
For agentic work requiring tool calls, switch to qwen3-coder-next or devstral.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
known_hosts
- hyperstack-vm.toml: set tool_call_parser=llama3_json for nemotron-super so vLLM
accepts tool_choice requests from opencode; model won't spontaneously call tools
so the vLLM 0.17.1 token_ids crash in llama3_json won't trigger
- hyperstack.rb: wait_for_ssh now also removes the WireGuard hostname
(hyperstack.wg1) from known_hosts alongside the IP, preventing
StrictHostKeyChecking failures across VM recreates
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
deepseek-r1-32b, qwen3-32b, devstral presets
- hyperstack.rb: add extra_vllm_args array field to preset resolver and
vllm_install_script; flags are appended verbatim to the docker run command,
enabling per-preset vLLM flags (reasoning parsers, Mistral loader)
- hyperstack.rb: show extra_args in dry-run model switch output
- hyperstack-vm.toml: fix nemotron-super to use actual NVIDIA Nemotron-3-Super-120B-A12B
AWQ (cyankiwi) with trust_remote_code=true; previous preset incorrectly used llama-3.3-70b
- hyperstack-vm.toml: add deepseek-r1-32b (--reasoning-parser deepseek_r1, ~18 GB)
- hyperstack-vm.toml: add qwen3-32b (--reasoning-parser deepseek_r1, ~18 GB)
- hyperstack-vm.toml: add devstral (Mistral tokenizer+config format, ~15 GB); --load_format
mistral omitted because AWQ weights are in standard HF safetensors format
All 6 new/updated presets end-to-end tested on A100 80GB (vLLM 0.17.1).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
WireGuard
New vLLM model presets (all end-to-end tested on A100 80GB):
- gpt-oss-20b: openai/gpt-oss-20b — MoE 20B, ~14 GB MXFP4, ultra-fast (3.6B active)
- gpt-oss-120b: openai/gpt-oss-120b — MoE 120B, ~65 GB MXFP4, powerful reasoning
- qwen25-coder-32b: Qwen/Qwen2.5-Coder-32B-Instruct-AWQ — ~18 GB, best 32B coder
- qwen3-coder-30b: QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ — ~18 GB Qwen3 coder
gpt-oss models disable --enable-auto-tool-choice (tool_call_parser = ""): vLLM 0.17.1's
llama3_json parser crashes on gpt-oss responses because the new token_ids field in the
response is passed as an unexpected keyword argument to extract_tool_calls().
gpt-oss-120b max_model_len raised to 40960: Claude Code's system prompt alone is ~33K
tokens, so 16K was insufficient. 40K allows Claude Code to connect with headroom.
Use wireguard_gateway_hostname (hyperstack.wg1) instead of raw 192.168.3.1 IP for all
connection URLs (tests, ready message, dry-run output). The hostname is derived from the
wg interface name and resolves via /etc/hosts.
Fix test max_tokens: raised from 50 to 500 so reasoning models (e.g. gpt-oss) have
enough tokens to complete chain-of-thought before producing content.
Fix qwen25-coder-32b max_model_len: model config has max_position_embeddings=32768,
not 128K as assumed. Using 65536 caused a vLLM pydantic validation error.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
Replace cyankiwi/Llama-3_3-Nemotron-Super-49B-v1_5-AWQ-4bit with
casperhansen/llama-3.3-70b-instruct-awq for the nemotron-super preset.
The NAS model's config.json has num_key_value_heads=null by design for
its heterogeneous per-layer attention architecture, which is incompatible
with vLLM's pydantic ModelConfig validation (requires int). No working
AWQ quant for this architecture exists; Llama-3.3-70B-Instruct AWQ is
a proven drop-in for the extended-analysis use case.
Also fix test_vllm to use the model reported by /v1/models instead of
the static config default, so tests pass after a model switch.
Add trust_remote_code support to vllm_install_script for future models
that require custom HuggingFace model code.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
- New [vllm.presets.*] TOML section with two presets:
qwen3-coder-next bullpoint/Qwen3-Coder-Next-AWQ-4bit (256k ctx, coding)
nemotron-super solidrust/Llama-3.3-Nemotron-Super-49B-v1-AWQ (131k ctx, analysis)
- New CLI subcommand: `model list` — show presets, mark the active one
- New CLI subcommand: `model switch PRESET [--dry-run]` — switch the
running VM to a different preset without redeploying:
1. stops old Docker container (if container_name differs)
2. starts new container and waits for model readiness
3. hot-reloads LiteLLM config via litellm_reload_script (no venv reinstall)
4. updates state file with new vllm_model / vllm_container_name / vllm_preset
- New `create --model PRESET` flag — deploy with a non-default preset
- vllm_install_script and litellm_install_script now accept preset_config:/
model_override: so callers can override individual fields without
duplicating the full config
- State file now tracks vllm_container_name and vllm_preset for clean
container lifecycle management across switches
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
- Replace Ollama (disabled by default) with vLLM Docker container +
LiteLLM Anthropic-API proxy as the default inference backend
- vLLM setup: pulls vllm/vllm-openai, starts container on port 11434,
polls until model is loaded (up to 10 min for first 45 GB download)
- LiteLLM setup: installs in Python venv, writes config mapping Claude
model aliases to the vLLM model, runs as a systemd service on port 4000
- New CLI flags on `create`: --vllm/--no-vllm, --ollama/--no-ollama to
override config at runtime
- New `test` command: end-to-end inference test over WireGuard against
vLLM (/v1/models + /v1/chat/completions) and LiteLLM (/v1/messages)
- UFW rules now open both port 11434 (inference) and 4000 (LiteLLM)
from the WireGuard subnet
- Rename hyperstack_vm.rb → hyperstack.rb
- Add README.md with quickstart, Claude Code / OpenCode usage, CLI
reference, monitoring commands, and VRAM sizing notes
- Add vllm-setup.txt: detailed manual setup notes and architecture docs
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
|
|
retries, apt lock waits, and model verification
|
|
|
|
|
|
|
|
|
|
Changed deployment strategy from RollingUpdate to Recreate to prevent
file lock conflicts when using RWO volumes. Syncthing uses file locks
in the config directory, so only one pod can access it at a time.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Added startup, liveness, and readiness probes to the syncthing deployment.
The liveness probe will automatically restart the pod when it becomes unresponsive
due to stale NFS file handles, preventing filesystem errors and service disruptions.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Removed AGENTS.md as it is no longer referenced in the project.
Added .serena/ directory with project configuration for Serena AI agent.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Added startup, liveness, and readiness probes to the docker registry deployment.
The liveness probe will automatically restart the pod when it returns 503 errors
(which happens when NFS storage becomes stale), preventing prolonged ImagePullBackOff
issues for dependent services like radicale and git-server.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
|
Added quick toggle script section to README with usage examples. Reorganized
client configuration into quick toggle (recommended) and manual sections for
better clarity. The toggle script provides an easy way to enable/disable
Pi-hole DNS on Fedora laptops without remembering NetworkManager commands.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
|
|
Updated immich-server and immich-machine-learning images to v2.5.5.
This release includes major features:
- Free Up Space functionality
- Non-destructive photo editing
- Database backup and restore via web
- Upload improvements and visual refresh
- Progressive JPEGs support
- Additional fine-grained API key permissions
Release notes: https://github.com/immich-app/immich/releases/tag/v2.5.5
|
|
- Created custom ingress-lan.yaml for immich.f3s.lan.buetow.org with TLS
- Removed unsupported 'lan' ingress config from ArgoCD app values
- The Immich Helm chart doesn't support multiple named ingresses,
so we create the LAN ingress as a custom resource instead
This aligns immich with other services that have both regular and
LAN ingress endpoints.
|
|
|
|
|
|
Added liveness and readiness probes to the PostgreSQL deployment to ensure
it's ready to accept connections before immich-server attempts to connect.
This fixes the race condition causing ECONNREFUSED errors and pod restarts.
The readiness probe prevents services from routing traffic until PostgreSQL
is fully initialized, while the liveness probe ensures the container is
restarted if PostgreSQL becomes unresponsive.
|
|
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
OpenSSH refuses to load host keys from NFS for security reasons.
The solution is to store keys in persistent NFS (so they survive
restarts) but copy them to a local emptyDir at startup (so sshd
can read them).
This ensures:
- SSH host keys persist across pod restarts
- sshd can successfully load the keys from local storage
- Clients don't see "host key changed" warnings
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
The sshd_config file needs to be in the persistent SSH directory
for the git-server container to start properly. Added ConfigMap
and updated initContainer to copy it on first deployment.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
SSH host keys are now stored in persistent NFS storage instead of
ephemeral emptyDir. Keys are only generated once on first deployment,
preventing known_hosts updates on every pod restart.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Add *.f3s.lan.buetow.org ingress resources for all services to enable
LAN access with TLS termination. This allows direct access from the
192.168.1.0/24 network through the FreeBSD CARP/relayd setup.
Services updated:
- argocd: argocd.f3s.lan.buetow.org
- cgit: cgit.f3s.lan.buetow.org
- grafana: grafana.f3s.lan.buetow.org
- anki-sync-server: anki.f3s.lan.buetow.org
- apache: f3s.lan.buetow.org, www.f3s.lan.buetow.org, standby.f3s.lan.buetow.org
- audiobookshelf: audiobookshelf.f3s.lan.buetow.org
- filebrowser: filebrowser.f3s.lan.buetow.org
- immich: immich.f3s.lan.buetow.org
- ipv6test: ipv6test.f3s.lan.buetow.org (+ ipv4/ipv6 subdomains)
- keybr: keybr.f3s.lan.buetow.org
- koreader-sync-server: koreader.f3s.lan.buetow.org
- miniflux: flux.f3s.lan.buetow.org
- opodsync: gpodder.f3s.lan.buetow.org
- radicale: radicale.f3s.lan.buetow.org
- syncthing: syncthing.f3s.lan.buetow.org
- tracing-demo: tracing-demo.f3s.lan.buetow.org
- wallabag: bag.f3s.lan.buetow.org
- webdav: webdav.f3s.lan.buetow.org
All LAN ingresses use:
- TLS with f3s-lan-tls certificate (cert-manager)
- Traefik entrypoints: web,websecure
- Same backend services as external ingresses
Also fixed koreader-sync-server ingress to use modern annotations.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Add reminder to push changes to r0 for ArgoCD sync and note about
accepting new SSH host keys.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Document how to configure clients to use Pi-hole DNS:
- NetworkManager configuration for Linux/Fedora
- Multiple DNS servers with automatic failover
- Firefox DoH configuration notes
- Verification steps
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Configure Pi-hole DNS service to bind to 192.168.1.120 for LAN access.
This allows clients on the 192.168.1.0/24 network to use Pi-hole as
their DNS server without needing to be on the Wireguard mesh.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
The pihole helm chart uses 'admin.existingSecret' not 'adminPasswordSecret'.
This ensures the deployment uses the pihole-admin-password secret instead
of creating a default 'pihole-password' secret with 'admin' password.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Pi-hole's web interface returns 403 Forbidden when accessed via the
root path. Add a Traefik middleware that redirects requests to the
root URL to /admin/ path where the web interface is accessible.
Also add the pihole ArgoCD application manifest.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
|
|
|
|
|
|
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
Co-authored-by: Cursor <cursoragent@cursor.com>
|
|
|
|
|
|
- Add cert-manager for self-signed TLS certificates
- Create wildcard cert for *.f3s.lan.buetow.org
- Add LAN ingress to Navidrome (navidrome.f3s.lan.buetow.org)
- Document FreeBSD relayd configuration for LAN access
- Add comprehensive setup guide
LAN access uses existing CARP VIP (192.168.1.138) on f0/f1
with relayd forwarding HTTP/HTTPS to k3s Traefik NodePorts.
External access via OpenBSD relayd continues unchanged.
|
|
|
|
Adds Navidrome music streaming server with:
- Helm chart with deployment, service, ingress, and persistent volumes
- Two PVs: data (10Gi) and music library (200Gi)
- ArgoCD application for automated deployment
- Ingress at navidrome.f3s.buetow.org
- Justfile for operational commands
|
|
Fix Apache PidFile and cgid ScriptSock paths for non-root user.
|
|
Switch the container to an unprivileged UID/GID and update probes for port 8080.
|
|
|
|
|
|
|