summaryrefslogtreecommitdiff
path: root/snippets/hyperstack/README.md
AgeCommit message (Collapse)Author
2026-03-21movedPaul Buetow
2026-03-20Add Pi VM launcher scriptsPaul Buetow
2026-03-20task 299: clean up local state on deletePaul Buetow
2026-03-20task 298: pin SSH host keys per VM statePaul Buetow
2026-03-20task 297: lock down default ingress rulesPaul Buetow
2026-03-18Use hyperstack.wg1 hostname instead of hardcoded IP in READMEPaul Buetow
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-18Add vLLM + LiteLLM support; rename script; add READMEPaul Buetow
- Replace Ollama (disabled by default) with vLLM Docker container + LiteLLM Anthropic-API proxy as the default inference backend - vLLM setup: pulls vllm/vllm-openai, starts container on port 11434, polls until model is loaded (up to 10 min for first 45 GB download) - LiteLLM setup: installs in Python venv, writes config mapping Claude model aliases to the vLLM model, runs as a systemd service on port 4000 - New CLI flags on `create`: --vllm/--no-vllm, --ollama/--no-ollama to override config at runtime - New `test` command: end-to-end inference test over WireGuard against vLLM (/v1/models + /v1/chat/completions) and LiteLLM (/v1/messages) - UFW rules now open both port 11434 (inference) and 4000 (LiteLLM) from the WireGuard subnet - Rename hyperstack_vm.rb → hyperstack.rb - Add README.md with quickstart, Claude Code / OpenCode usage, CLI reference, monitoring commands, and VRAM sizing notes - Add vllm-setup.txt: detailed manual setup notes and architecture docs Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>