odysseus

MrSphay/odysseus

Fork 0

Commit Graph

Author	SHA1	Message	Date
Carlos Arroyo	00320972dc	fix: CUDA/GPU detection for vLLM and llama.cpp in Docker (#479 ) Two bugs caused GPU inference to silently fall back to CPU inside the Odysseus Docker container even when the GPU was correctly passed through. ## entrypoint.sh — CUDA_HOME detection only covered CUDA 13.x wheels The nvcc glob only searched vidia/cu13, which matches the vidia-nvcc-cu13 pip wheel layout. CUDA 12.x wheels install nvcc to vidia/cuda_nvcc/bin/nvcc (nvidia-cuda-nvcc-cu12) or vidia/cu12 (nvidia-nvcc-cu12) — completely different paths. The glob found nothing, so CUDA_HOME was never set. Worse, VLLM_USE_FLASHINFER_SAMPLER=0 was inside the same if-block, so it was never set either. vLLM then tried to JIT-compile the FlashInfer sampler at startup, failed with 'Could not find nvcc', and crashed — even though the GPU was fully visible to the container. Fix: expand the search to also check nvidia/cu12 and nvidia/cuda_nvcc. Move VLLM_USE_FLASHINFER_SAMPLER=0 to an unconditional export after the loop (it is sampler-only, no impact on the attention path, and the correct setting for any container where CUDA headers may be incomplete). ## cookbook_routes.py — llama.cpp Linux source build silently fell back to CPU The cmake invocation was: cmake -B build -DGGML_CUDA=ON 2>/dev/null \|\| cmake -B build 2>/dev/null suppressed all configure errors. When nvcc is absent (the slim base image has no CUDA toolkit — intentional), cmake fails silently, then the \|\| fallback re-runs without -DGGML_CUDA=ON. A CPU-only binary is produced with no warning. Additionally, a stale CMakeCache.txt from the failed CUDA attempt was reused (no rm -rf build), poisoning the next configure run. The macOS branch already did rm -rf build for exactly this reason; the Linux branch did not. Fix: before cmake, detect pip-installed nvcc across the same three path patterns as entrypoint.sh and expose it via CUDA_HOME/PATH. If nvcc is found, run a clean CUDA build with full error visibility. If not, fall back to a CPU build with an explicit warning telling the user how to get a GPU build (install vLLM via Cookbook -> Dependencies, which brings the CUDA wheels including nvcc, then re-launch). ## .env.example — document Windows COMPOSE_FILE separator Added a comment showing the semicolon separator required on Windows Docker Desktop alongside the existing colon-separator (Linux) example.	2026-06-01 22:30:51 +09:00
Sirsyorrz	b4a1d88beb	docker: set CUDA_HOME for pip-installed vllm in Cookbook (#228 ) When Cookbook installs vllm via `pip install --user vllm`, pip pulls in nvidia-cuda-* wheels under /app/.local but doesn't set CUDA_HOME or create /usr/local/cuda. vllm 0.22+ then crashes during engine init: RuntimeError: Could not find nvcc and default cuda_home='/usr/local/cuda' doesn't exist After that, the mixed cuda-nvcc 13.3 / cuda-runtime 13.0 wheel combo fails FlashInfer's JIT sampler with: error: "CUDA compiler and CUDA toolkit headers are incompatible" Detect the pip-installed nvcc on startup, point CUDA_HOME at it, and default VLLM_USE_FLASHINFER_SAMPLER=0 (sampler only, no attention impact) so the engine boots. No-op when vllm isn't installed. Fixes #214. Co-authored-by: sirs <sirs@local>	2026-06-01 02:48:25 +00:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

Author

SHA1

Message

Date

Carlos Arroyo

00320972dc

fix: CUDA/GPU detection for vLLM and llama.cpp in Docker (#479 )

Two bugs caused GPU inference to silently fall back to CPU inside the
Odysseus Docker container even when the GPU was correctly passed through.

## entrypoint.sh — CUDA_HOME detection only covered CUDA 13.x wheels

The nvcc glob only searched
vidia/cu13, which matches the

vidia-nvcc-cu13 pip wheel layout. CUDA 12.x wheels install nvcc to

vidia/cuda_nvcc/bin/nvcc (nvidia-cuda-nvcc-cu12) or
vidia/cu12
(nvidia-nvcc-cu12) — completely different paths. The glob found nothing,
so CUDA_HOME was never set.

Worse, VLLM_USE_FLASHINFER_SAMPLER=0 was inside the same if-block, so it
was never set either. vLLM then tried to JIT-compile the FlashInfer
sampler at startup, failed with 'Could not find nvcc', and crashed — even
though the GPU was fully visible to the container.

Fix: expand the search to also check nvidia/cu12 and nvidia/cuda_nvcc.
Move VLLM_USE_FLASHINFER_SAMPLER=0 to an unconditional export after the
loop (it is sampler-only, no impact on the attention path, and the correct
setting for any container where CUDA headers may be incomplete).

## cookbook_routes.py — llama.cpp Linux source build silently fell back to CPU

The cmake invocation was:
  cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build

2>/dev/null suppressed all configure errors. When nvcc is absent (the
slim base image has no CUDA toolkit — intentional), cmake fails silently,
then the || fallback re-runs without -DGGML_CUDA=ON. A CPU-only binary is
produced with no warning. Additionally, a stale CMakeCache.txt from the
failed CUDA attempt was reused (no rm -rf build), poisoning the next
configure run. The macOS branch already did rm -rf build for exactly this
reason; the Linux branch did not.

Fix: before cmake, detect pip-installed nvcc across the same three path
patterns as entrypoint.sh and expose it via CUDA_HOME/PATH. If nvcc is
found, run a clean CUDA build with full error visibility. If not, fall
back to a CPU build with an explicit warning telling the user how to get
a GPU build (install vLLM via Cookbook -> Dependencies, which brings the
CUDA wheels including nvcc, then re-launch).

## .env.example — document Windows COMPOSE_FILE separator

Added a comment showing the semicolon separator required on Windows
Docker Desktop alongside the existing colon-separator (Linux) example.

2026-06-01 22:30:51 +09:00

Sirsyorrz

b4a1d88beb

docker: set CUDA_HOME for pip-installed vllm in Cookbook (#228 )

When Cookbook installs vllm via `pip install --user vllm`, pip pulls in
nvidia-cuda-* wheels under /app/.local but doesn't set CUDA_HOME or
create /usr/local/cuda. vllm 0.22+ then crashes during engine init:

  RuntimeError: Could not find nvcc and default cuda_home='/usr/local/cuda' doesn't exist

After that, the mixed cuda-nvcc 13.3 / cuda-runtime 13.0 wheel combo
fails FlashInfer's JIT sampler with:

  error: "CUDA compiler and CUDA toolkit headers are incompatible"

Detect the pip-installed nvcc on startup, point CUDA_HOME at it, and
default VLLM_USE_FLASHINFER_SAMPLER=0 (sampler only, no attention
impact) so the engine boots. No-op when vllm isn't installed.

Fixes #214.

Co-authored-by: sirs <sirs@local>

2026-06-01 02:48:25 +00:00

pewdiepie-archdaemon

e5c99a5eee

Odysseus v1.0

2026-05-31 23:58:26 +09:00

3 Commits