Improve Docker GPU setup diagnostics (#705)

* Improve Docker GPU setup diagnostics

Add a Docker GPU preflight script for NVIDIA users. The script is
read-only by default, checks host NVIDIA drivers, Docker availability,
and container GPU passthrough, and prints actionable next steps.

Add explicit opt-in modes to print install commands, install NVIDIA
Container Toolkit on Ubuntu/Debian, and enable the NVIDIA Compose overlay
in .env after passthrough is verified.

Document common NVIDIA Docker failure modes, ignore generated .env
backups, and clarify that Cookbook can only detect GPUs exposed to the
Odysseus container.

* Clarify Docker GPU diagnostic limits
This commit is contained in:
Alexandre Teixeira
2026-06-02 04:30:40 +01:00
committed by GitHub
parent 517aa593e0
commit 8455b88643
5 changed files with 635 additions and 5 deletions

View File

@@ -1,6 +1,11 @@
# NVIDIA GPU overlay. Enable by setting COMPOSE_FILE in .env:
# COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
#
# Use scripts/check-docker-gpu.sh to diagnose GPU passthrough, optionally
# install the NVIDIA Container Toolkit (Ubuntu/Debian), and write COMPOSE_FILE
# to .env. The script is read-only by default — it installs nothing and never
# edits .env unless explicitly asked.
#
# Requires the NVIDIA Container Toolkit on the host.
# Arch: sudo pacman -S nvidia-container-toolkit
# Debian: sudo apt install nvidia-container-toolkit