diff options
| author | Paul Buetow <paul@buetow.org> | 2026-03-18 09:10:14 +0200 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2026-03-18 09:10:14 +0200 |
| commit | d8575832ae0022f94cd786b15f8b88de0bf18672 (patch) | |
| tree | 75872514846cfddb1434281a59b6673344023ff7 /snippets/general-coding | |
| parent | 8dca92ea40b191b9de367197aac7e1f882ed3d43 (diff) | |
Add vLLM + LiteLLM support; rename script; add README
- Replace Ollama (disabled by default) with vLLM Docker container +
LiteLLM Anthropic-API proxy as the default inference backend
- vLLM setup: pulls vllm/vllm-openai, starts container on port 11434,
polls until model is loaded (up to 10 min for first 45 GB download)
- LiteLLM setup: installs in Python venv, writes config mapping Claude
model aliases to the vLLM model, runs as a systemd service on port 4000
- New CLI flags on `create`: --vllm/--no-vllm, --ollama/--no-ollama to
override config at runtime
- New `test` command: end-to-end inference test over WireGuard against
vLLM (/v1/models + /v1/chat/completions) and LiteLLM (/v1/messages)
- UFW rules now open both port 11434 (inference) and 4000 (LiteLLM)
from the WireGuard subnet
- Rename hyperstack_vm.rb → hyperstack.rb
- Add README.md with quickstart, Claude Code / OpenCode usage, CLI
reference, monitoring commands, and VRAM sizing notes
- Add vllm-setup.txt: detailed manual setup notes and architecture docs
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'snippets/general-coding')
0 files changed, 0 insertions, 0 deletions
