Files
odysseus/tests
lekt8 0e6cbd8315 Drop GPU-only flags from the CPU-only (-ngl 0) serve command (#1433)
A CPU-only llama.cpp serve config still emitted --flash-attn on and exported
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto
profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed
to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is
unchanged — the gate only affects the ngl=0 path.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 04:26:15 +09:00
..
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00
2026-06-01 02:22:17 +00:00
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00