odysseus/tests/test_cookbook_cpu_only_serve.py at dev

Files

lekt8 0e6cbd8315 Drop GPU-only flags from the CPU-only (-ngl 0) serve command (#1433 )

A CPU-only llama.cpp serve config still emitted --flash-attn on and exported
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto
profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed
to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is
unchanged — the gate only affects the ngl=0 path.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-03 04:26:15 +09:00

1.4 KiB

Raw Permalink Blame History

View Raw

1.4 KiB Raw Permalink Blame History

1.4 KiB

Raw Permalink Blame History