provisioner: support docker_image and pre_start_cmd for Gemma 4 startup

Adds docker_image and pre_start_cmd config fields to config.rb and provisioning.rb so the Gemma 4 31B workarounds are baked in: - docker_image = "vllm/vllm-openai:nightly" (stable lacks Gemma 4 support) - pre_start_cmd = "pip install -q transformers==5.5.0" (stable pins <5) - extra_docker_env = ["CUDA_VISIBLE_DEVICES=0"] (required with --entrypoint bash) When pre_start_cmd is set, the provisioner switches to --entrypoint bash and chains the patch command before launching vLLM, so create-both works end-to-end without manual container replacement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
author: Paul Buetow <paul@buetow.org> 2026-04-06 20:47:39 +0300
committer: Paul Buetow <paul@buetow.org> 2026-04-06 20:47:39 +0300
commit: eb800cdf31176584ee0b604f5bda65f0d2880909 (patch)
tree: 0bf9ef9491137e9e5e6600f1819b1b8d048a24af /lib/hyperstack/config.rb
parent: 0664ffcc62b2fb240286fde463635e510a41df84 (diff)
1 files changed, 16 insertions, 0 deletions
diff --git a/lib/hyperstack/config.rb b/lib/hyperstack/config.rb
index 402f45d..178429d 100644
--- a/lib/hyperstack/config.rb
+++ b/lib/hyperstack/config.rb
@@ -445,6 +445,19 @@ module HyperstackVM
       Array(fetch('vllm', 'extra_docker_env')).map(&:to_s)
     end
 
+    # Docker image for vLLM. Defaults to the stable release.
+    # Override to 'vllm/vllm-openai:nightly' for models not yet supported by stable vLLM.
+    def vllm_docker_image
+      fetch('vllm', 'docker_image') || 'vllm/vllm-openai:latest'
+    end
+
+    # Shell command to run inside the container before starting vLLM (via --entrypoint bash).
+    # Used to patch dependencies at startup, e.g. upgrading transformers for new model architectures.
+    # nil means no pre-start command — vLLM is started directly (default entrypoint).
+    def vllm_pre_start_cmd
+      fetch('vllm', 'pre_start_cmd')
+    end
+
     # Whether to pass --enable-prefix-caching to vLLM. Defaults to true.
     # Disable for hybrid Mamba models (NemotronH): prefix caching forces Mamba into "all" cache
     # mode which pre-allocates states for all sequences, consuming extra VRAM on startup.
@@ -477,6 +490,9 @@ module HyperstackVM
         'trust_remote_code' => raw.key?('trust_remote_code') ? raw['trust_remote_code'] : false,
         'extra_vllm_args' => raw.key?('extra_vllm_args') ? Array(raw['extra_vllm_args']) : [],
         'extra_docker_env' => raw.key?('extra_docker_env') ? Array(raw['extra_docker_env']) : [],
+        # docker_image / pre_start_cmd: nil means "not set in preset" — fall back to [vllm] defaults.
+        'docker_image' => raw.key?('docker_image') ? raw['docker_image'] : nil,
+        'pre_start_cmd' => raw.key?('pre_start_cmd') ? raw['pre_start_cmd'] : nil,
         # nil means "not set in preset" — fall back to the top-level [vllm] value in the script.
         'enable_prefix_caching' => raw.key?('enable_prefix_caching') ? raw['enable_prefix_caching'] : nil
       }
author	Paul Buetow <paul@buetow.org>	2026-04-06 20:47:39 +0300
committer	Paul Buetow <paul@buetow.org>	2026-04-06 20:47:39 +0300
commit	eb800cdf31176584ee0b604f5bda65f0d2880909 (patch)
tree	0bf9ef9491137e9e5e6600f1819b1b8d048a24af /lib/hyperstack/config.rb
parent	0664ffcc62b2fb240286fde463635e510a41df84 (diff)