When Cookbook installs vllm via `pip install --user vllm`, pip pulls in nvidia-cuda-* wheels under /app/.local but doesn't set CUDA_HOME or create /usr/local/cuda. vllm 0.22+ then crashes during engine init: RuntimeError: Could not find nvcc and default cuda_home='/usr/local/cuda' doesn't exist After that, the mixed cuda-nvcc 13.3 / cuda-runtime 13.0 wheel combo fails FlashInfer's JIT sampler with: error: "CUDA compiler and CUDA toolkit headers are incompatible" Detect the pip-installed nvcc on startup, point CUDA_HOME at it, and default VLLM_USE_FLASHINFER_SAMPLER=0 (sampler only, no attention impact) so the engine boots. No-op when vllm isn't installed. Fixes #214. Co-authored-by: sirs <sirs@local>
3.0 KiB
3.0 KiB