summaryrefslogtreecommitdiff
path: root/snippets/general-coding
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2026-03-18 09:10:14 +0200
committerPaul Buetow <paul@buetow.org>2026-03-18 09:10:14 +0200
commitd8575832ae0022f94cd786b15f8b88de0bf18672 (patch)
tree75872514846cfddb1434281a59b6673344023ff7 /snippets/general-coding
parent8dca92ea40b191b9de367197aac7e1f882ed3d43 (diff)
Add vLLM + LiteLLM support; rename script; add README
- Replace Ollama (disabled by default) with vLLM Docker container + LiteLLM Anthropic-API proxy as the default inference backend - vLLM setup: pulls vllm/vllm-openai, starts container on port 11434, polls until model is loaded (up to 10 min for first 45 GB download) - LiteLLM setup: installs in Python venv, writes config mapping Claude model aliases to the vLLM model, runs as a systemd service on port 4000 - New CLI flags on `create`: --vllm/--no-vllm, --ollama/--no-ollama to override config at runtime - New `test` command: end-to-end inference test over WireGuard against vLLM (/v1/models + /v1/chat/completions) and LiteLLM (/v1/messages) - UFW rules now open both port 11434 (inference) and 4000 (LiteLLM) from the WireGuard subnet - Rename hyperstack_vm.rb → hyperstack.rb - Add README.md with quickstart, Claude Code / OpenCode usage, CLI reference, monitoring commands, and VRAM sizing notes - Add vllm-setup.txt: detailed manual setup notes and architecture docs Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'snippets/general-coding')
0 files changed, 0 insertions, 0 deletions