_detect_nvidia parsed nvidia-smi --query-gpu=memory.total,name and did float(memory.total) per row, dropping the row on ValueError. Grace Blackwell GB10 (DGX Spark, sm_121) reports memory.total as '[N/A]'/'Not Supported' because the GPU shares the system LPDDR pool rather than carrying discrete VRAM — so the only GPU row was dropped and a real GB10 (even with vLLM running on it) was reported as 'No GPU', breaking Cookbook recommendations and model switching. Keep a named device whose memory.total is non-numeric: when there are no discrete-VRAM rows but such unified devices exist, report a unified-memory CUDA GPU backed by the system RAM pool (has_gpu, name, backend=cuda, count, unified_memory=True) — mirroring how Apple Silicon and AMD APUs are already handled. Discrete GPUs are unchanged, and a box with a real discrete GPU keeps the discrete path. Adds tests/test_hwfit_unified_nvidia.py with a GB10 nvidia-smi fixture: the device is detected (not dropped), surfaces through detect_system with unified_memory propagated, discrete GPUs stay non-unified, and a discrete GPU takes precedence over an N/A-memory row. Co-authored-by: NubsCarson <nubs@nubs.site>
3.1 KiB
3.1 KiB