conf - Configuration files for the automation of my personal infrastructure (servers, laptops, workstations, phones)!

Age	Commit message (Collapse)	Author
2026-03-20	fix wireguard setup ssh host pinning	Paul Buetow

2026-03-20	task 301: extract provisioning collaborators	Paul Buetow

2026-03-20	task 300: persist effective service mode	Paul Buetow

2026-03-20	task 299: clean up local state on delete	Paul Buetow

2026-03-20	task 298: pin SSH host keys per VM state	Paul Buetow

2026-03-20	task 297: lock down default ingress rules	Paul Buetow

2026-03-20	Remove peers by allowed IPs from local WireGuard config	Paul Buetow

2026-03-20	Initial commit: add hyperstack-vm1.toml, hyperstack-vm2.toml, update ↵	Paul Buetow
	hyperstack.rb and wg1-setup.sh for multi-VM WireGuard support
2026-03-18	vllm: skip docker pull on model switch, persist torch compile cache	Paul Buetow
	- model switch now passes pull_image: false to avoid surprise multi-GB image downloads when the upstream vLLM image was updated upstream; docker pull is still run on initial install (pull_image: true default) - mount /ephemeral/vllm_cache → /root/.cache/vllm so torch.compile artifacts survive container restarts; saves ~30-60 s on warm switches - add vllm_compile_cache_dir helper (sibling of hug_cache_dir) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	gpt-oss-120b: revert to 131072 — hard architecture limit	Paul Buetow
	max_position_embeddings=131072 in model config.json; exceeding it causes NaN/CUDA OOB. 163840 was rejected by vLLM at startup. The 135K error requires starting a fresh opencode conversation instead. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	gpt-oss-120b: raise max_model_len to 163840 (160K)	Paul Buetow
	131K was still too small — observed 135K token conversations in practice. Physical KV capacity is 168K blocks so 160K is safe without OOM. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	gpt-oss-120b: raise max_model_len to 131072	Paul Buetow
	MXFP4 KV cache is compact enough that vLLM allocated 168K token blocks (10560×16) at 0.92 utilization — the 40K limit was too conservative and caused negative max_tokens errors in long Claude Code sessions. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	fix: handle bundler self_manager.rb error with Errno::ENOENT	Paul Buetow

2026-03-18	fix: refactor CLI help to DRY up duplicated code	Paul Buetow

2026-03-18	cli: show help when called without arguments	Paul Buetow

2026-03-18	refactor: Split Config class per SRP	Paul Buetow
	- Created ConfigLoader for TOML loading and validation - Kept Config for configuration value access only - Reduced Config from 489 lines to ~200 lines - Fixed CLI to use ConfigLoader and pass @path to Config
2026-03-18	hyperstack status: display active vLLM model	Paul Buetow
	Show the currently loaded model (from state file, or config default) so it's immediately visible without running `model list`. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	nemotron-super: set max_model_len=262144 (256K); document NoPE and OOM risk	Paul Buetow
	Tested 1M context (NoPE allows arbitrary max_position_embeddings without YaRN) — OOMs on A100 80GB due to insufficient VRAM after 60GB model weights. 256K (262144) is the practical ceiling on this hardware. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	nemotron-super: use qwen3_xml tool call parser — same XML format, works	Paul Buetow
	Both Nemotron and Qwen3-XML use identical <tool_call><function=name> <parameter=p>value</parameter></function></tool_call> format. qwen3_xml correctly parses Nemotron's output; tool calling now works with opencode and other API clients. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	nemotron-super: revert to no tool calling; add nemotron_v3 reasoning parser	Paul Buetow
	vLLM 0.17.1 has no tool call parser for Nemotron's custom XML format (<tool_call><function=...><parameter=...>). Setting llama3_json produced garbage output. Reverted to tool_call_parser="" with a clear comment. Added --reasoning-parser nemotron_v3 via extra_vllm_args so <think> tokens are properly exposed as reasoning_content in the API response. For agentic work requiring tool calls, switch to qwen3-coder-next or devstral. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Fix nemotron-super tool_call_parser; auto-clear WireGuard hostname from ↵	Paul Buetow
	known_hosts - hyperstack-vm.toml: set tool_call_parser=llama3_json for nemotron-super so vLLM accepts tool_choice requests from opencode; model won't spontaneously call tools so the vLLM 0.17.1 token_ids crash in llama3_json won't trigger - hyperstack.rb: wait_for_ssh now also removes the WireGuard hostname (hyperstack.wg1) from known_hosts alongside the IP, preventing StrictHostKeyChecking failures across VM recreates Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Add extra_vllm_args support; fix nemotron-super to real 120B; add ↵	Paul Buetow
	deepseek-r1-32b, qwen3-32b, devstral presets - hyperstack.rb: add extra_vllm_args array field to preset resolver and vllm_install_script; flags are appended verbatim to the docker run command, enabling per-preset vLLM flags (reasoning parsers, Mistral loader) - hyperstack.rb: show extra_args in dry-run model switch output - hyperstack-vm.toml: fix nemotron-super to use actual NVIDIA Nemotron-3-Super-120B-A12B AWQ (cyankiwi) with trust_remote_code=true; previous preset incorrectly used llama-3.3-70b - hyperstack-vm.toml: add deepseek-r1-32b (--reasoning-parser deepseek_r1, ~18 GB) - hyperstack-vm.toml: add qwen3-32b (--reasoning-parser deepseek_r1, ~18 GB) - hyperstack-vm.toml: add devstral (Mistral tokenizer+config format, ~15 GB); --load_format mistral omitted because AWQ weights are in standard HF safetensors format All 6 new/updated presets end-to-end tested on A100 80GB (vLLM 0.17.1). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Add gpt-oss, qwen25-coder-32b, qwen3-coder-30b presets; use hostname for ↵	Paul Buetow
	WireGuard New vLLM model presets (all end-to-end tested on A100 80GB): - gpt-oss-20b: openai/gpt-oss-20b — MoE 20B, ~14 GB MXFP4, ultra-fast (3.6B active) - gpt-oss-120b: openai/gpt-oss-120b — MoE 120B, ~65 GB MXFP4, powerful reasoning - qwen25-coder-32b: Qwen/Qwen2.5-Coder-32B-Instruct-AWQ — ~18 GB, best 32B coder - qwen3-coder-30b: QuantTrio/Qwen3-Coder-30B-A3B-Instruct-AWQ — ~18 GB Qwen3 coder gpt-oss models disable --enable-auto-tool-choice (tool_call_parser = ""): vLLM 0.17.1's llama3_json parser crashes on gpt-oss responses because the new token_ids field in the response is passed as an unexpected keyword argument to extract_tool_calls(). gpt-oss-120b max_model_len raised to 40960: Claude Code's system prompt alone is ~33K tokens, so 16K was insufficient. 40K allows Claude Code to connect with headroom. Use wireguard_gateway_hostname (hyperstack.wg1) instead of raw 192.168.3.1 IP for all connection URLs (tests, ready message, dry-run output). The hostname is derived from the wg interface name and resolves via /etc/hosts. Fix test max_tokens: raised from 50 to 500 so reasoning models (e.g. gpt-oss) have enough tokens to complete chain-of-thought before producing content. Fix qwen25-coder-32b max_model_len: model config has max_position_embeddings=32768, not 128K as assumed. Using 65536 caused a vLLM pydantic validation error. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Fix nemotron-super preset and test_vllm model detection	Paul Buetow
	Replace cyankiwi/Llama-3_3-Nemotron-Super-49B-v1_5-AWQ-4bit with casperhansen/llama-3.3-70b-instruct-awq for the nemotron-super preset. The NAS model's config.json has num_key_value_heads=null by design for its heterogeneous per-layer attention architecture, which is incompatible with vLLM's pydantic ModelConfig validation (requires int). No working AWQ quant for this architecture exists; Llama-3.3-70B-Instruct AWQ is a proven drop-in for the extended-analysis use case. Also fix test_vllm to use the model reported by /v1/models instead of the static config default, so tests pass after a model switch. Add trust_remote_code support to vllm_install_script for future models that require custom HuggingFace model code. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Add vLLM model presets and live model switching	Paul Buetow
	- New [vllm.presets.*] TOML section with two presets: qwen3-coder-next bullpoint/Qwen3-Coder-Next-AWQ-4bit (256k ctx, coding) nemotron-super solidrust/Llama-3.3-Nemotron-Super-49B-v1-AWQ (131k ctx, analysis) - New CLI subcommand: `model list` — show presets, mark the active one - New CLI subcommand: `model switch PRESET [--dry-run]` — switch the running VM to a different preset without redeploying: 1. stops old Docker container (if container_name differs) 2. starts new container and waits for model readiness 3. hot-reloads LiteLLM config via litellm_reload_script (no venv reinstall) 4. updates state file with new vllm_model / vllm_container_name / vllm_preset - New `create --model PRESET` flag — deploy with a non-default preset - vllm_install_script and litellm_install_script now accept preset_config:/ model_override: so callers can override individual fields without duplicating the full config - State file now tracks vllm_container_name and vllm_preset for clean container lifecycle management across switches Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Use hyperstack.wg1 hostname instead of hardcoded IP in README	Paul Buetow
	Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18	Add vLLM + LiteLLM support; rename script; add README	Paul Buetow
	- Replace Ollama (disabled by default) with vLLM Docker container + LiteLLM Anthropic-API proxy as the default inference backend - vLLM setup: pulls vllm/vllm-openai, starts container on port 11434, polls until model is loaded (up to 10 min for first 45 GB download) - LiteLLM setup: installs in Python venv, writes config mapping Claude model aliases to the vLLM model, runs as a systemd service on port 4000 - New CLI flags on `create`: --vllm/--no-vllm, --ollama/--no-ollama to override config at runtime - New `test` command: end-to-end inference test over WireGuard against vLLM (/v1/models + /v1/chat/completions) and LiteLLM (/v1/messages) - UFW rules now open both port 11434 (inference) and 4000 (LiteLLM) from the WireGuard subnet - Rename hyperstack_vm.rb → hyperstack.rb - Add README.md with quickstart, Claude Code / OpenCode usage, CLI reference, monitoring commands, and VRAM sizing notes - Add vllm-setup.txt: detailed manual setup notes and architecture docs Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-16	Update hyperstack VM bootstrap, WireGuard, and Ollama setup logic; add ↵	Paul Buetow
	retries, apt lock waits, and model verification
2026-03-15	cleanup	Paul Buetow

2026-03-15	Add Hyperstack VM automation and model defaults	Paul Buetow

2026-03-14	Update project config, remove CLAUDE.md, add Gemfile and Rakefile for rcm	Paul Buetow

2026-02-15	cleanup of old zone template files and Rexfile	Paul Buetow

2026-02-15	Use Recreate strategy for syncthing to avoid file lock conflicts	Paul Buetow
	Changed deployment strategy from RollingUpdate to Recreate to prevent file lock conflicts when using RWO volumes. Syncthing uses file locks in the config directory, so only one pod can access it at a time. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15	Add health probes to syncthing deployment to auto-recover from stale NFS mounts	Paul Buetow
	Added startup, liveness, and readiness probes to the syncthing deployment. The liveness probe will automatically restart the pod when it becomes unresponsive due to stale NFS file handles, preventing filesystem errors and service disruptions. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15	Remove AGENTS.md and add Serena project configuration	Paul Buetow
	Removed AGENTS.md as it is no longer referenced in the project. Added .serena/ directory with project configuration for Serena AI agent. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-15	Add health probes to registry deployment to auto-recover from stale NFS mounts	Paul Buetow
	Added startup, liveness, and readiness probes to the docker registry deployment. The liveness probe will automatically restart the pod when it returns 503 errors (which happens when NFS storage becomes stale), preventing prolonged ImagePullBackOff issues for dependent services like radicale and git-server. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08	docs(pihole): add DNS toggle script documentation and improve setup instructions	Paul Buetow
	Added quick toggle script section to README with usage examples. Reorganized client configuration into quick toggle (recommended) and manual sections for better clarity. The toggle script provides an easy way to enable/disable Pi-hole DNS on Fedora laptops without remembering NetworkManager commands. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08	jo	Paul Buetow

2026-02-08	Upgrade immich to v2.5.5 latest stable release	Paul Buetow
	Updated immich-server and immich-machine-learning images to v2.5.5. This release includes major features: - Free Up Space functionality - Non-destructive photo editing - Database backup and restore via web - Upload improvements and visual refresh - Progressive JPEGs support - Additional fine-grained API key permissions Release notes: https://github.com/immich-app/immich/releases/tag/v2.5.5
2026-02-08	Add immich LAN ingress and remove unsupported helm config	Paul Buetow
	- Created custom ingress-lan.yaml for immich.f3s.lan.buetow.org with TLS - Removed unsupported 'lan' ingress config from ArgoCD app values - The Immich Helm chart doesn't support multiple named ingresses, so we create the LAN ingress as a custom resource instead This aligns immich with other services that have both regular and LAN ingress endpoints.
2026-02-08	doesnt belong here	Paul Buetow

2026-02-08	Merge remote changes and keep PostgreSQL health checks	Paul Buetow

2026-02-08	Add PostgreSQL health checks to fix immich pod restarts	Paul Buetow
	Added liveness and readiness probes to the PostgreSQL deployment to ensure it's ready to accept connections before immich-server attempts to connect. This fixes the race condition causing ECONNREFUSED errors and pod restarts. The readiness probe prevents services from routing traffic until PostgreSQL is fully initialized, while the liveness probe ensures the container is restarted if PostgreSQL becomes unresponsive.
2026-02-07	docs(git-server): update README with persistent SSH keys info	Paul Buetow
	Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07	fix(git-server): copy SSH keys from NFS to local emptyDir	Paul Buetow
	OpenSSH refuses to load host keys from NFS for security reasons. The solution is to store keys in persistent NFS (so they survive restarts) but copy them to a local emptyDir at startup (so sshd can read them). This ensures: - SSH host keys persist across pod restarts - sshd can successfully load the keys from local storage - Clients don't see "host key changed" warnings Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07	fix(git-server): add sshd_config to persistent storage	Paul Buetow
	The sshd_config file needs to be in the persistent SSH directory for the git-server container to start properly. Added ConfigMap and updated initContainer to copy it on first deployment. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07	fix(git-server): persist SSH host keys across pod restarts	Paul Buetow
	SSH host keys are now stored in persistent NFS storage instead of ephemeral emptyDir. Keys are only generated once on first deployment, preventing known_hosts updates on every pod restart. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07	feat: add LAN ingresses for all services	Paul Buetow
	Add *.f3s.lan.buetow.org ingress resources for all services to enable LAN access with TLS termination. This allows direct access from the 192.168.1.0/24 network through the FreeBSD CARP/relayd setup. Services updated: - argocd: argocd.f3s.lan.buetow.org - cgit: cgit.f3s.lan.buetow.org - grafana: grafana.f3s.lan.buetow.org - anki-sync-server: anki.f3s.lan.buetow.org - apache: f3s.lan.buetow.org, www.f3s.lan.buetow.org, standby.f3s.lan.buetow.org - audiobookshelf: audiobookshelf.f3s.lan.buetow.org - filebrowser: filebrowser.f3s.lan.buetow.org - immich: immich.f3s.lan.buetow.org - ipv6test: ipv6test.f3s.lan.buetow.org (+ ipv4/ipv6 subdomains) - keybr: keybr.f3s.lan.buetow.org - koreader-sync-server: koreader.f3s.lan.buetow.org - miniflux: flux.f3s.lan.buetow.org - opodsync: gpodder.f3s.lan.buetow.org - radicale: radicale.f3s.lan.buetow.org - syncthing: syncthing.f3s.lan.buetow.org - tracing-demo: tracing-demo.f3s.lan.buetow.org - wallabag: bag.f3s.lan.buetow.org - webdav: webdav.f3s.lan.buetow.org All LAN ingresses use: - TLS with f3s-lan-tls certificate (cert-manager) - Traefik entrypoints: web,websecure - Same backend services as external ingresses Also fixed koreader-sync-server ingress to use modern annotations. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07	docs(agents): add note about pushing to internal git server	Paul Buetow
	Add reminder to push changes to r0 for ArgoCD sync and note about accepting new SSH host keys. Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-07	docs(pihole): add DNS client configuration guide	Paul Buetow
	Document how to configure clients to use Pi-hole DNS: - NetworkManager configuration for Linux/Fedora - Multiple DNS servers with automatic failover - Firefox DoH configuration notes - Verification steps Co-authored-by: Cursor <cursoragent@cursor.com>