Add macOS Apple Silicon Cookbook support
* Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning hardware.py detects Apple Silicon locally and over SSH, reporting backend=metal, the chip name, and a RAM-scaled fraction of unified memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8) that can't be served on Metal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 32ac81dbc680361463a088dae867d555d5a79c3b) * Generate macOS/Metal serve commands and surface the Metal GPU cookbook_routes.py adds a macOS serve path (Ollama, Metal-aware llama.cpp build using `sysctl hw.ncpu` instead of `nproc`, and a clear error if vLLM is attempted). The frontend defaults Metal serving to llama.cpp and offers llama.cpp/Ollama instead of vLLM/SGLang. The odysseus-cookbook CLI's `gpus` command reports the Metal GPU via sysctl/vm_stat. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 4ba01ce25d256ae032029898f361c824a34fcd4b) * Add launchd LaunchAgent for macOS (systemd equivalent) com.odysseus.ui.plist + install-service-macos.sh run Odysseus at login and restart on crash, the macOS counterpart to odysseus-ui.service. The installer auto-fills paths from the venv, so there's no hand-editing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 3d4b6b2c7b8b31af32201ed278115df9a559dea9) * Document macOS install (brew, Ollama, AirPlay port, launchd) README + setup.py cover the Homebrew / Apple Silicon path: brew install python@3.11 tmux ollama, Metal serving via Ollama/llama.cpp, the launchd service, and the macOS AirPlay Receiver conflict on ports 7000/5000. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 8dc9a3578a1726f070ed9f75c0958ae291a6d966) * Add downloadable macOS launcher app builder build-macos-app.sh generates dist/Odysseus.app and a drag-to-Applications dist/Odysseus.dmg. The app starts the local server from this repo's venv and opens the UI in a chrome-less app window (Chromium --app mode, falling back to the default browser). It's a launcher wrapper — it drives the venv rather than bundling Python — so the install path is baked in at build time. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 7927940c3810ee34640803b198d334a6ac93474d) * Harden macOS Cookbook support: hide MLX, fix Metal build cache Builds on the adopted PR #213 macOS/Metal work with two fixes and tests: - fit.py: always drop MLX-quantized models. Odysseus only generates serve commands for llama.cpp/Ollama (Metal) and vLLM/SGLang (CUDA); MLX needs the mlx_lm runtime and the catalog's MLX repos ship no GGUF alternative, so they were surfaced on Apple Silicon but could never be served. - cookbook_routes.py (macOS branch only): `rm -rf build` before configure so a poisoned CMakeCache from a prior failed CUDA attempt can't make every later build fail; explicit -DCMAKE_BUILD_TYPE=Release; a clear "brew install cmake" hint if cmake is missing. Linux/CUDA path unchanged. - tests/test_hwfit_macos.py: MLX hidden on metal, MLX still hidden on CUDA (regression guard), Metal detection on Apple Silicon, and skipped on Linux/Intel (proves non-macOS detection is untouched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Propagate unified_memory flag and document macOS GPU/Docker caveat - hardware.py: detect_system now carries the unified_memory flag from GPU detection into the system dict (it was set by _detect_apple_silicon / AMD-APU detection but dropped during result assembly, so the API always reported null). Lets callers distinguish unified from discrete VRAM. - README: prominent warning that Docker on Apple Silicon can't reach the Metal GPU (runs a Linux VM) — Cookbook must run natively for GPU serving; fix stale text that said Cookbook recommends MLX models (now hidden as unservable). - test: detect_system propagates unified_memory. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Put Odysseus's venv bin on PATH for cookbook runners Native (non-Docker) installs run from a virtualenv whose bin holds the `hf` CLI and `python3` the cookbook download/serve tmux scripts shell out to. Those scripts start in a fresh login shell with the venv NOT activated, so on a native macOS install `hf download` failed with "hf: command not found" — and the `pip --user` self-heal missed because macOS has no bare `pip` command. - cookbook_helpers.py: _local_tooling_path_export() — pure helper returning a PATH export for the running interpreter's bin dir (escaped for double quotes). - cookbook_routes.py: download + serve runners prepend that dir on local runs (gated off SSH/Windows); swap the `pip` install fallbacks to `python3 -m pip`. - tests: helper output for normal and spaced paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Document macOS llama.cpp serving prerequisites Clarify the two serving paths on Apple Silicon: the recommended zero-build route (brew install llama.cpp ships a Metal llama-server Cookbook finds on PATH), and the from-source fallback, which requires cmake + Xcode Command Line Tools. Without those the build is skipped and serving silently degrades to a slow CPU build, so new users now know to install them (or use the prebuilt) up front. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Recommend only GGUF-servable models on Metal Apple Silicon's only serving engines are llama.cpp and Ollama, both GGUF-only (vLLM/SGLang are CUDA/ROCm and don't run on macOS). The catalog tags raw safetensors repos with a default Q4_K_M quant, so the fit-ranking was recommending ~397/501 models that have no GGUF and fail to serve on Metal with "No GGUF found" (e.g. microsoft/Phi-mini-MoE-instruct). Drop any model without a real GGUF (is_gguf/gguf_sources) on Apple Silicon — subsumes the previous AWQ/GPTQ/FP8 special-case into one rule. On CUDA these stay visible since vLLM serves safetensors directly. Metal recommendations go 501 -> 104, all actually servable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Remove macOS launchd LaunchAgent (cherry-picked extra) Drop the launchd service from the PR #213 cherry-picks: the install-service-macos.sh installer, the com.odysseus.ui.plist template, and the README section documenting them. Tangential to the core Cookbook/Metal support and not wanted. The build-macos-app.sh launcher is kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add one-command macOS quick start (start-macos.sh) Running Odysseus natively on a Mac previously meant ~7 manual terminal steps (brew deps, venv, activate, pip, setup.py, uvicorn with the right port) — not friendly for a generic macOS user, and the native run is required because Docker on macOS can't reach the Metal GPU. - start-macos.sh: installs Homebrew deps (python@3.11, tmux, prebuilt Metal llama.cpp), creates the venv, installs requirements, runs setup, and launches on a non-AirPlay port (7860). Idempotent; re-run to start again. - README: the Apple Silicon section now leads with this one-command quick start and the clickable .app, with engine/port/manual details folded into a collapsible block. Added a pointer at the top of the manual-install section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * macOS quick start: auto-open browser when ready The "open this URL" line scrolled out of view as uvicorn kept logging after it, so users missed it. Now start-macos.sh waits (in the background) until the server accepts connections, prints a boxed "ready" banner at that point (i.e. after the startup burst, not before), and opens the URL in the default browser automatically. Skippable with ODYSSEUS_NO_OPEN=1 for headless/SSH use. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Don't assume/force a specific Python version on macOS The README claimed "system Python is 3.9" — a machine-specific generalization that's often wrong (macOS ships no recent Python by default; many users already have 3.11+). Make it generic, and make start-macos.sh detect an existing Python 3.11+ and use it, only installing python@3.11 when none is found instead of forcing it on top of the user's Python. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Align start-macos.sh venv path with build-macos-app.sh start-macos.sh created the environment in .venv/, but build-macos-app.sh and the manual install steps use venv/ — so the clickable .app wouldn't reuse the quick-start's environment and would rebuild a second one. Use venv/ everywhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * README: state clearly that MLX is unsupported on Apple Silicon Odysseus has no mlx_lm runtime; it serves GGUF (llama.cpp/Ollama) and CUDA (vLLM/SGLang) only. MLX-only models can't run on a Mac and are hidden from Cookbook — make that explicit in both the quick start and the details. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * start-macos.sh: build the venv with an arm64 Python on Apple Silicon A clean-room run surfaced this: with a universal2/x86 Python (e.g. the python.org installer under /usr/local), the venv's compiled extensions install as arm64 but get loaded as x86_64 when launched from the .app bundle, so it crashes with "incompatible architecture (have arm64, need x86_64)". The terminal run happened to work only because a universal binary defaults to arm64 there. On Apple Silicon, look only under /opt/homebrew (arm64-only) for the build Python, and install Homebrew's python@3.11 if none is present — so the venv is arm64-only and launches correctly from both the terminal and the .app. Intel and non-mac paths are unchanged. Verified end-to-end in a clean clone: .app now boots on Metal with no arch error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address dev-exp review: macOS setup robustness + doc/UX fixes From the voltagent dev-exp review of the branch: - README: fix broken anchor links (the em-dash heading produced a slug the links didn't match); simplify the heading to a stable slug. - cookbook_routes.py: add /opt/homebrew/bin and /usr/local/bin to the serve PATH so a brew-installed llama-server/ollama is found instead of falling back to a slow source build. - start-macos.sh: guard against an empty Python path; fail fast with a clear message on port-in-use; ERR trap with a "safe to re-run" message; show pip progress (drop --quiet on the slow requirements install); stop the background browser-opener cleanly on exit/Ctrl+C (no orphaned poller). - setup.py: bind hint to 127.0.0.1; suppress the manual run-hint when launched by start-macos.sh (ODYSSEUS_SKIP_RUN_HINT) so the URL isn't contradictory. - build-macos-app.sh: the .app only opens the browser once the server is actually ready (not after the readiness timeout). - cookbookServe.js: drop "Diffusers" from the Metal backend picker — diffusion_server.py is CUDA-only, so it was an unservable option on macOS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: yunggilja <yunggilja@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
76
README.md
76
README.md
@@ -59,6 +59,10 @@ image build. Open `http://localhost:7000` after the containers are healthy.
|
||||
If port `7000` is already taken, set `APP_PORT=7001` (or another free port)
|
||||
in `.env`, recreate the container, and open `http://localhost:7001`.
|
||||
|
||||
> **On Apple Silicon, Docker can't use the Metal GPU** (it runs a Linux VM), so
|
||||
> Cookbook will serve models on the CPU only. For GPU-accelerated Cookbook,
|
||||
> run the app natively — see [Apple Silicon](#apple-silicon-m-series).
|
||||
|
||||
Cookbook remote servers use an Odysseus-owned SSH key from `./data/ssh`
|
||||
inside Docker. In **Cookbook -> Settings -> Servers**, generate/copy the
|
||||
public key and add it to the remote server's `~/.ssh/authorized_keys`.
|
||||
@@ -111,8 +115,12 @@ The Cookbook model catalog check should print a non-zero count. If it prints
|
||||
`0`, rebuild the Odysseus image with `docker compose build --no-cache odysseus`.
|
||||
|
||||
### Option 2: Manual install — Linux / macOS
|
||||
**Requirements:** Python 3.11+. On Linux/Termux, Cookbook also requires `tmux`
|
||||
for background model downloads and serves.
|
||||
**Requirements:** Python 3.11+. Cookbook also requires `tmux` for background
|
||||
model downloads and serves.
|
||||
|
||||
> **On macOS (Apple Silicon)?** Skip the manual steps below — run
|
||||
> `./start-macos.sh` for a one-command setup. See
|
||||
> [Apple Silicon](#apple-silicon-m-series).
|
||||
|
||||
Install system packages first:
|
||||
```bash
|
||||
@@ -124,19 +132,81 @@ sudo pacman -S tmux
|
||||
|
||||
# Fedora
|
||||
sudo dnf install tmux
|
||||
|
||||
# macOS (Homebrew). macOS ships no recent Python by default — install 3.11+
|
||||
# (skip the python line if you already have Python 3.11 or newer):
|
||||
brew install python@3.11 tmux
|
||||
```
|
||||
|
||||
Then install Odysseus:
|
||||
```bash
|
||||
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
|
||||
cd odysseus
|
||||
python3 -m venv venv
|
||||
python3 -m venv venv # on macOS use: python3.11 -m venv venv
|
||||
source venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
python setup.py # creates data dirs and prints an initial admin password
|
||||
python -m uvicorn app:app --host 0.0.0.0 --port 7000
|
||||
```
|
||||
|
||||
#### Apple Silicon (M-series)
|
||||
|
||||
> **On a Mac, run Odysseus natively (not in Docker) so Cookbook can use the
|
||||
> Metal GPU.** Cookbook serves models on whatever machine Odysseus runs on, and
|
||||
> Docker on macOS is a Linux VM with **no access to the GPU** — in a container
|
||||
> your Mac looks like a CPU-only Linux box.
|
||||
|
||||
**Quick start — one command.** From a fresh clone:
|
||||
```bash
|
||||
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
|
||||
cd odysseus
|
||||
./start-macos.sh
|
||||
```
|
||||
That installs what's needed via Homebrew (Python 3.11+, `tmux`, and a prebuilt
|
||||
Metal `llama-server`), sets everything up, and launches Odysseus at
|
||||
**http://127.0.0.1:7860**. Log in with the admin password it prints, open
|
||||
**Cookbook**, and it detects your GPU (`backend: metal`) and recommends GGUF
|
||||
models that fit your Mac. (MLX models aren't supported on macOS and are hidden —
|
||||
see below.) Re-run `./start-macos.sh` any time to start it again (use another
|
||||
port with `ODYSSEUS_PORT=7900 ./start-macos.sh`).
|
||||
|
||||
**Prefer a clickable app?** After your first `./start-macos.sh`, build a
|
||||
launcher `Odysseus.app` (+ a drag-to-Applications `.dmg`) that starts the server
|
||||
and opens the UI in its own window:
|
||||
```bash
|
||||
./build-macos-app.sh # → dist/Odysseus.app and dist/Odysseus.dmg
|
||||
```
|
||||
|
||||
<details>
|
||||
<summary>What <code>start-macos.sh</code> does, serving engines, and manual steps</summary>
|
||||
|
||||
`start-macos.sh` is just the manual steps wrapped up: Homebrew deps → a Python
|
||||
`venv` → `pip install -r requirements.txt` → `python setup.py` → `uvicorn` on a
|
||||
non-AirPlay port. Run them by hand if you prefer (the Linux steps above, but use
|
||||
`python3.11 -m venv` and `--port 7860`).
|
||||
|
||||
**Serving engines on Metal** — Cookbook only recommends models it can serve here:
|
||||
- **llama.cpp** — `brew install llama.cpp` (done by `start-macos.sh`) provides a
|
||||
prebuilt Metal `llama-server`, no compile. Without it, Cookbook builds it from
|
||||
source on first serve, which needs `cmake` + Xcode Command Line Tools
|
||||
(`brew install cmake && xcode-select --install`).
|
||||
- **Ollama** — `brew install ollama` is another simple Metal-accelerated option.
|
||||
- vLLM/SGLang are CUDA/ROCm-only and do **not** run on macOS.
|
||||
|
||||
**MLX models are not supported on Apple Silicon.** Odysseus serves models via
|
||||
llama.cpp/Ollama (GGUF) and vLLM/SGLang (CUDA) — it has no MLX (`mlx_lm`)
|
||||
runtime. So MLX-only models can't be served on a Mac and are deliberately
|
||||
**hidden** from Cookbook's recommendations there; pick a GGUF build instead.
|
||||
|
||||
**Port 7000 & AirPlay** — macOS AirPlay Receiver holds ports 7000/5000, so
|
||||
`start-macos.sh` defaults to **7860**. To use 7000, turn AirPlay Receiver off in
|
||||
System Settings → General → AirDrop & Handoff.
|
||||
|
||||
**Build prerequisites baked in** — the `.app` wraps this repo's `venv` (it
|
||||
doesn't bundle Python), so the path is fixed at build time — rebuild if you move
|
||||
the repo.
|
||||
</details>
|
||||
|
||||
### Option 3: Manual install — Windows (PowerShell)
|
||||
Windows support is not actively tested. Use it with caution; Docker on Linux
|
||||
or a Linux/macOS manual install is the safer path for now.
|
||||
|
||||
169
build-macos-app.sh
Executable file
169
build-macos-app.sh
Executable file
@@ -0,0 +1,169 @@
|
||||
#!/bin/bash
|
||||
# Build a downloadable macOS launcher app + .dmg for Odysseus.
|
||||
#
|
||||
# ./build-macos-app.sh
|
||||
#
|
||||
# Produces:
|
||||
# dist/Odysseus.app — double-click: starts the local server (using this
|
||||
# repo's venv) and opens the UI in an app-style window.
|
||||
# dist/Odysseus.dmg — drag-to-Applications disk image (the downloadable).
|
||||
#
|
||||
# This is a *launcher* wrapper: it drives the venv we set up in this repo, it
|
||||
# does not bundle Python. The install path is baked into the app at build time,
|
||||
# so rebuild if you move the repo. Override the port with ODYSSEUS_PORT.
|
||||
set -e
|
||||
|
||||
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
APP_NAME="Odysseus"
|
||||
INSTALL_DIR="$REPO_DIR"
|
||||
PORT="${ODYSSEUS_PORT:-7860}"
|
||||
DIST="$REPO_DIR/dist"
|
||||
APP="$DIST/$APP_NAME.app"
|
||||
|
||||
echo "Building $APP_NAME.app"
|
||||
echo " install dir: $INSTALL_DIR"
|
||||
echo " port: $PORT"
|
||||
|
||||
rm -rf "$APP"
|
||||
mkdir -p "$APP/Contents/MacOS" "$APP/Contents/Resources"
|
||||
|
||||
# ── Icon (best effort) — center-crop docs/odysseus.jpg to a square .icns ──
|
||||
if [ -f "$REPO_DIR/docs/odysseus.jpg" ] && command -v sips >/dev/null 2>&1; then
|
||||
TMPIMG="$(mktemp -d)"
|
||||
# Center-crop to a square, scale to 512 (sips' icns encoder caps at 512), and
|
||||
# let sips emit the .icns directly — more robust across macOS versions than
|
||||
# building an .iconset by hand.
|
||||
sips -c 720 720 "$REPO_DIR/docs/odysseus.jpg" --out "$TMPIMG/sq.png" >/dev/null 2>&1 || cp "$REPO_DIR/docs/odysseus.jpg" "$TMPIMG/sq.png"
|
||||
sips -z 512 512 "$TMPIMG/sq.png" --out "$TMPIMG/icon.png" >/dev/null 2>&1
|
||||
if sips -s format icns "$TMPIMG/icon.png" --out "$APP/Contents/Resources/odysseus.icns" >/dev/null 2>&1; then
|
||||
echo " icon: odysseus.icns"
|
||||
else
|
||||
echo " icon: (skipped — conversion failed)"
|
||||
fi
|
||||
rm -rf "$TMPIMG"
|
||||
else
|
||||
echo " icon: (skipped — no docs/odysseus.jpg)"
|
||||
fi
|
||||
|
||||
# ── Info.plist ──
|
||||
cat > "$APP/Contents/Info.plist" <<PLIST
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>CFBundleName</key> <string>$APP_NAME</string>
|
||||
<key>CFBundleDisplayName</key> <string>$APP_NAME</string>
|
||||
<key>CFBundleIdentifier</key> <string>com.odysseus.launcher</string>
|
||||
<key>CFBundleVersion</key> <string>1.0</string>
|
||||
<key>CFBundleShortVersionString</key><string>1.0</string>
|
||||
<key>CFBundlePackageType</key> <string>APPL</string>
|
||||
<key>CFBundleExecutable</key> <string>$APP_NAME</string>
|
||||
<key>CFBundleIconFile</key> <string>odysseus</string>
|
||||
<key>LSMinimumSystemVersion</key> <string>11.0</string>
|
||||
<key>NSHighResolutionCapable</key> <true/>
|
||||
<key>LSUIElement</key> <false/>
|
||||
</dict>
|
||||
</plist>
|
||||
PLIST
|
||||
|
||||
# ── Launcher executable (placeholders filled below) ──
|
||||
cat > "$APP/Contents/MacOS/$APP_NAME.tmpl" <<'LAUNCHER'
|
||||
#!/bin/bash
|
||||
# Odysseus.app — start the local server and open the UI in an app window.
|
||||
INSTALL_DIR="__INSTALL_DIR__"
|
||||
PORT="__PORT__"
|
||||
URL="http://127.0.0.1:${PORT}"
|
||||
export PATH="/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:$PATH"
|
||||
|
||||
UVICORN="$INSTALL_DIR/venv/bin/uvicorn"
|
||||
LOG="$INSTALL_DIR/logs/odysseus-app.log"
|
||||
|
||||
notify() { /usr/bin/osascript -e "display notification \"$1\" with title \"Odysseus\"" >/dev/null 2>&1; }
|
||||
die_gui() {
|
||||
/usr/bin/osascript -e "display dialog \"$1\" with title \"Odysseus\" buttons {\"OK\"} default button 1 with icon stop" >/dev/null 2>&1
|
||||
exit 1
|
||||
}
|
||||
|
||||
[ -x "$UVICORN" ] || die_gui "Odysseus isn't set up yet. Open Terminal and run:
|
||||
|
||||
cd $INSTALL_DIR
|
||||
python3.11 -m venv venv
|
||||
./venv/bin/pip install -r requirements.txt
|
||||
./venv/bin/python setup.py"
|
||||
|
||||
# Open the UI in a chrome-less app window (Chromium browsers), else default browser.
|
||||
open_ui() {
|
||||
local b base exe bin
|
||||
for b in "Google Chrome" "Microsoft Edge" "Brave Browser" "Chromium"; do
|
||||
for base in "/Applications" "$HOME/Applications"; do
|
||||
if [ -d "$base/$b.app" ]; then
|
||||
exe="$(/usr/bin/defaults read "$base/$b.app/Contents/Info" CFBundleExecutable 2>/dev/null)"
|
||||
bin="$base/$b.app/Contents/MacOS/$exe"
|
||||
if [ -x "$bin" ]; then
|
||||
"$bin" --app="$URL" --new-window >/dev/null 2>&1 &
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
done
|
||||
done
|
||||
/usr/bin/open "$URL"
|
||||
}
|
||||
|
||||
mkdir -p "$INSTALL_DIR/logs"
|
||||
|
||||
# Already running? Just open the UI.
|
||||
if /usr/bin/curl -s -o /dev/null --max-time 2 "$URL"; then
|
||||
open_ui
|
||||
exit 0
|
||||
fi
|
||||
|
||||
notify "Starting…"
|
||||
cd "$INSTALL_DIR" || die_gui "Install folder not found: $INSTALL_DIR"
|
||||
"$UVICORN" app:app --host 127.0.0.1 --port "$PORT" >>"$LOG" 2>&1 &
|
||||
SERVER_PID=$!
|
||||
|
||||
# Quitting the app stops the server it started.
|
||||
trap 'kill $SERVER_PID 2>/dev/null; exit 0' TERM INT
|
||||
|
||||
# Wait for readiness (first run downloads an embedding model — allow ~2 min).
|
||||
READY=0
|
||||
for i in $(seq 1 120); do
|
||||
/usr/bin/curl -s -o /dev/null --max-time 2 "$URL" && { READY=1; break; }
|
||||
kill -0 "$SERVER_PID" 2>/dev/null || die_gui "Odysseus failed to start. Log:
|
||||
$LOG"
|
||||
sleep 1
|
||||
done
|
||||
|
||||
if [ "$READY" = "1" ]; then
|
||||
open_ui
|
||||
else
|
||||
notify "Odysseus is taking a while — open $URL once it finishes starting."
|
||||
fi
|
||||
wait "$SERVER_PID"
|
||||
LAUNCHER
|
||||
|
||||
sed -e "s|__INSTALL_DIR__|$INSTALL_DIR|g" -e "s|__PORT__|$PORT|g" \
|
||||
"$APP/Contents/MacOS/$APP_NAME.tmpl" > "$APP/Contents/MacOS/$APP_NAME"
|
||||
rm -f "$APP/Contents/MacOS/$APP_NAME.tmpl"
|
||||
chmod +x "$APP/Contents/MacOS/$APP_NAME"
|
||||
|
||||
# Refresh Finder's icon cache for the new bundle.
|
||||
touch "$APP"
|
||||
|
||||
# ── .dmg (drag-to-Applications) ──
|
||||
echo "Packaging dist/$APP_NAME.dmg"
|
||||
STAGE="$(mktemp -d)/dmg"
|
||||
mkdir -p "$STAGE"
|
||||
cp -R "$APP" "$STAGE/"
|
||||
ln -s /Applications "$STAGE/Applications"
|
||||
rm -f "$DIST/$APP_NAME.dmg"
|
||||
hdiutil create -volname "$APP_NAME" -srcfolder "$STAGE" -ov -format UDZO "$DIST/$APP_NAME.dmg" >/dev/null
|
||||
rm -rf "$STAGE"
|
||||
|
||||
echo ""
|
||||
echo "Done:"
|
||||
echo " $APP"
|
||||
echo " $DIST/$APP_NAME.dmg"
|
||||
echo ""
|
||||
echo "Run it: open '$APP'"
|
||||
echo "Install: open '$DIST/$APP_NAME.dmg' (drag Odysseus to Applications)"
|
||||
@@ -102,6 +102,28 @@ def _shell_path(p: str) -> str:
|
||||
return '"' + p + '"'
|
||||
|
||||
|
||||
def _local_tooling_path_export(executable: str) -> str:
|
||||
"""Bash line prepending the running interpreter's bin dir to PATH.
|
||||
|
||||
When Odysseus runs from a virtualenv, that bin dir holds the tools the
|
||||
cookbook runners shell out to (`hf`, `python`). tmux runners start from a
|
||||
fresh login shell with the venv NOT activated, so without this they can't
|
||||
find `hf` and downloads fail with "hf: command not found" — notably on
|
||||
macOS, where the `pip --user` self-heal also misses (`pip` isn't a command,
|
||||
only `pip3`/`python3 -m pip`). Local runs only; meaningless over SSH.
|
||||
"""
|
||||
bin_dir = os.path.dirname(os.path.abspath(executable))
|
||||
# Escape for a double-quoted context: $PATH must still expand, but spaces
|
||||
# and shell metacharacters in the path must be preserved literally.
|
||||
esc = (
|
||||
bin_dir.replace("\\", "\\\\")
|
||||
.replace('"', '\\"')
|
||||
.replace("$", "\\$")
|
||||
.replace("`", "\\`")
|
||||
)
|
||||
return f'export PATH="{esc}:$PATH"'
|
||||
|
||||
|
||||
def _ps_squote(v: str) -> str:
|
||||
"""Escape a value for PowerShell single-quoted string interpolation.
|
||||
Belt-and-suspenders on top of _validate_token's regex — if the regex
|
||||
|
||||
@@ -7,6 +7,7 @@ import os
|
||||
import re
|
||||
import shlex
|
||||
import shutil
|
||||
import sys
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
|
||||
@@ -25,7 +26,7 @@ from routes.cookbook_helpers import (
|
||||
_validate_repo_id, _validate_include, _validate_remote_host, _validate_token,
|
||||
_validate_local_dir, _validate_ssh_port, _validate_gpus, _shell_path,
|
||||
_ps_squote, _bash_squote, _validate_serve_cmd, _parse_serve_phase,
|
||||
_safe_env_prefix,
|
||||
_safe_env_prefix, _local_tooling_path_export,
|
||||
ModelDownloadRequest, ServeRequest,
|
||||
)
|
||||
|
||||
@@ -357,16 +358,22 @@ def setup_cookbook_routes() -> APIRouter:
|
||||
lines.append(f"export HF_TOKEN='{_bash_squote(req.hf_token)}'")
|
||||
# Ensure pip-user scripts (e.g. hf CLI installed via --user) are on PATH
|
||||
lines.append('export PATH="$HOME/.local/bin:$PATH"')
|
||||
# When Odysseus runs from a venv (e.g. native macOS install), put its bin
|
||||
# on PATH so the tmux shell finds the bundled `hf`/`python3` without an
|
||||
# activated venv. Local bash runs only — meaningless over SSH/Windows.
|
||||
if not req.remote_host and req.platform != "windows":
|
||||
lines.append(_local_tooling_path_export(sys.executable))
|
||||
# Best-effort install hf CLI (always). hf_transfer (Rust parallel downloader)
|
||||
# is fast but flaky on large files — it tends to crash near the end at high
|
||||
# throughput. Retries set disable_hf_transfer to fall back to the plain,
|
||||
# slower-but-reliable downloader (resumes cleanly from the .incomplete files).
|
||||
lines.append("command -v hf >/dev/null 2>&1 || pip install --user --break-system-packages -q -U huggingface_hub 2>/dev/null || pip install -q -U huggingface_hub 2>/dev/null")
|
||||
# Use `python3 -m pip` not `pip` — macOS has no bare `pip` command.
|
||||
lines.append("command -v hf >/dev/null 2>&1 || python3 -m pip install --user --break-system-packages -q -U huggingface_hub 2>/dev/null || python3 -m pip install -q -U huggingface_hub 2>/dev/null")
|
||||
if req.disable_hf_transfer:
|
||||
lines.append("export HF_HUB_ENABLE_HF_TRANSFER=0")
|
||||
lines.append("export HF_HUB_DOWNLOAD_MAX_WORKERS=4")
|
||||
else:
|
||||
lines.append("python3 -c 'import hf_transfer' 2>/dev/null || pip install --user --break-system-packages -q hf_transfer 2>/dev/null || pip install -q hf_transfer 2>/dev/null")
|
||||
lines.append("python3 -c 'import hf_transfer' 2>/dev/null || python3 -m pip install --user --break-system-packages -q hf_transfer 2>/dev/null || python3 -m pip install -q hf_transfer 2>/dev/null")
|
||||
lines.append("python3 -c 'import hf_transfer' 2>/dev/null && export HF_HUB_ENABLE_HF_TRANSFER=1")
|
||||
lines.append("export HF_HUB_DOWNLOAD_MAX_WORKERS=8")
|
||||
|
||||
@@ -845,6 +852,10 @@ def setup_cookbook_routes() -> APIRouter:
|
||||
# ── Linux/Termux: bash + tmux (existing flow) ──
|
||||
runner_lines = ["#!/bin/bash"]
|
||||
runner_lines.extend(_user_shell_path_bootstrap())
|
||||
# Put Odysseus's own venv bin on PATH (local runs only) so the serve
|
||||
# shell resolves the bundled python3/hf, mirroring the download flow.
|
||||
if not remote:
|
||||
runner_lines.append(_local_tooling_path_export(sys.executable))
|
||||
runner_lines.append("export FLASHINFER_DISABLE_VERSION_CHECK=1")
|
||||
if req.hf_token:
|
||||
runner_lines.append(f"export HF_TOKEN='{_bash_squote(req.hf_token)}'")
|
||||
@@ -864,7 +875,10 @@ def setup_cookbook_routes() -> APIRouter:
|
||||
# Jinja2 rejects (do_tojson ensure_ascii). Build it once from
|
||||
# source if missing; keep llama-cpp-python only as a fallback.
|
||||
runner_lines.append('# Ensure a llama.cpp server (prefer native llama-server)')
|
||||
runner_lines.append('export PATH="$HOME/.local/bin:$HOME/bin:$HOME/llama.cpp/build/bin:$PATH"')
|
||||
# Include the Homebrew bin dirs so a brew-installed llama-server /
|
||||
# ollama is found (otherwise macOS falls back to a slow source build).
|
||||
# /opt/homebrew = Apple Silicon, /usr/local = Intel; harmless on Linux.
|
||||
runner_lines.append('export PATH="$HOME/.local/bin:$HOME/bin:$HOME/llama.cpp/build/bin:/opt/homebrew/bin:/usr/local/bin:$PATH"')
|
||||
runner_lines.append('if [ -d /data/data/com.termux ]; then')
|
||||
runner_lines.append(' # Termux: no native build — use the Python bindings (CPU).')
|
||||
runner_lines.append(' if ! python3 -c "import llama_cpp" 2>/dev/null; then')
|
||||
@@ -876,17 +890,50 @@ def setup_cookbook_routes() -> APIRouter:
|
||||
runner_lines.append(' echo "Native llama-server not found — building from source (one-time, may take a few minutes)..."')
|
||||
runner_lines.append(' mkdir -p ~/bin')
|
||||
runner_lines.append(' cd ~ && [ -d llama.cpp ] || git clone --depth 1 https://github.com/ggml-org/llama.cpp')
|
||||
# GPU build if CUDA is present; fall back to a plain (CPU) build.
|
||||
runner_lines.append(' cd ~/llama.cpp && { cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build; } \\')
|
||||
runner_lines.append(' && cmake --build build -j"$(nproc)" --target llama-server \\')
|
||||
runner_lines.append(' && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
|
||||
# Build with the right accelerator: Metal on macOS (llama.cpp
|
||||
# enables it automatically, no flag), CUDA on Linux when present,
|
||||
# else a plain CPU build. nproc is Linux-only — fall back to
|
||||
# `sysctl hw.ncpu` on macOS. (Tip: `brew install llama.cpp` ships
|
||||
# a prebuilt llama-server and skips this whole source build.)
|
||||
runner_lines.append(' NPROC="$(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null || echo 4)"')
|
||||
runner_lines.append(' if [ "$(uname -s)" = "Darwin" ]; then')
|
||||
runner_lines.append(' command -v cmake >/dev/null 2>&1 || echo "WARNING: cmake not found — install it with: brew install cmake (or: brew install llama.cpp for a prebuilt llama-server)."')
|
||||
# Start from a clean cache: a prior failed configure (e.g. a CUDA
|
||||
# attempt) poisons build/CMakeCache.txt, so a plain `cmake -B build`
|
||||
# would reuse the bad settings and fail again. CMAKE_BUILD_TYPE is
|
||||
# explicit so the binary is optimized (Metal auto-enables on macOS).
|
||||
runner_lines.append(' cd ~/llama.cpp && rm -rf build && cmake -B build -DCMAKE_BUILD_TYPE=Release \\')
|
||||
runner_lines.append(' && cmake --build build -j"$NPROC" --target llama-server \\')
|
||||
runner_lines.append(' && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
|
||||
runner_lines.append(' else')
|
||||
runner_lines.append(' cd ~/llama.cpp && { cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build; } \\')
|
||||
runner_lines.append(' && cmake --build build -j"$NPROC" --target llama-server \\')
|
||||
runner_lines.append(' && ln -sf ~/llama.cpp/build/bin/llama-server ~/bin/llama-server')
|
||||
runner_lines.append(' fi')
|
||||
runner_lines.append(' # If the native build failed, fall back to the Python bindings.')
|
||||
runner_lines.append(' if ! command -v llama-server &>/dev/null && ! python3 -c "import llama_cpp" 2>/dev/null; then')
|
||||
runner_lines.append(' echo "llama-server build failed — installing Python bindings as fallback..."')
|
||||
runner_lines.append(' pip install --user --break-system-packages -q llama-cpp-python 2>/dev/null || pip install -q llama-cpp-python 2>/dev/null || true')
|
||||
runner_lines.append(' fi')
|
||||
runner_lines.append('fi')
|
||||
elif "ollama" in req.cmd:
|
||||
# Ollama manages its own model store and HTTP server. Just make
|
||||
# sure the binary exists and the daemon is up before running the
|
||||
# command (the natural serving engine on Apple Silicon / Metal).
|
||||
runner_lines.append('if ! command -v ollama &>/dev/null; then')
|
||||
runner_lines.append(' echo "ERROR: Ollama not found. Install it (macOS: brew install ollama, or https://ollama.com/download), then launch again."')
|
||||
runner_lines.append(' exit 127')
|
||||
runner_lines.append('fi')
|
||||
runner_lines.append('if ! curl -sf http://localhost:11434/api/tags >/dev/null 2>&1; then')
|
||||
runner_lines.append(' echo "Starting ollama server..."; (ollama serve >/dev/null 2>&1 &)')
|
||||
runner_lines.append(' for _ in 1 2 3 4 5 6 7 8 9 10; do curl -sf http://localhost:11434/api/tags >/dev/null 2>&1 && break; sleep 1; done')
|
||||
runner_lines.append('fi')
|
||||
elif "vllm serve" in req.cmd:
|
||||
# vLLM is CUDA/ROCm-only and does not run on macOS at all.
|
||||
runner_lines.append('if [ "$(uname -s)" = "Darwin" ]; then')
|
||||
runner_lines.append(' echo "ERROR: vLLM does not run on macOS. Use Ollama or llama.cpp (Metal) instead."')
|
||||
runner_lines.append(' exit 1')
|
||||
runner_lines.append('fi')
|
||||
# Put ~/.local/bin on PATH first — without a venv, vllm installs
|
||||
# there via --user and the non-login serve shell otherwise can't
|
||||
# find the `vllm` CLI ("command not found"). Mirrors llama.cpp above.
|
||||
|
||||
@@ -95,21 +95,89 @@ def cmd_list(args) -> None:
|
||||
|
||||
# ─── gpus ────────────────────────────────────────────────────────────
|
||||
|
||||
def _macos_metal_gpu() -> list | None:
|
||||
"""Apple Silicon has no discrete VRAM — report total unified memory as the
|
||||
GPU budget so the web UI's picker shows the Mac's Metal GPU instead of
|
||||
'no GPU'. `free` is approximated from vm_stat (page-granular); macOS doesn't
|
||||
expose Metal utilization to the shell, so util is 0. Returns None off macOS."""
|
||||
if sys.platform != "darwin":
|
||||
return None
|
||||
|
||||
def _sysctl(key: str) -> str | None:
|
||||
try:
|
||||
r = subprocess.run(["sysctl", "-n", key], capture_output=True, text=True, timeout=5)
|
||||
return r.stdout.strip() if r.returncode == 0 else None
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
memsize = _sysctl("hw.memsize")
|
||||
if not memsize or not memsize.isdigit():
|
||||
return None
|
||||
total_mb = int(memsize) // (1024 * 1024)
|
||||
name = _sysctl("machdep.cpu.brand_string") or "Apple Silicon"
|
||||
|
||||
free_mb = total_mb
|
||||
try:
|
||||
vm = subprocess.run(["vm_stat"], capture_output=True, text=True, timeout=5)
|
||||
if vm.returncode == 0:
|
||||
page_size, pages = 4096, {}
|
||||
for line in vm.stdout.splitlines():
|
||||
if "page size of" in line:
|
||||
m = re.search(r"page size of (\d+)", line)
|
||||
if m:
|
||||
page_size = int(m.group(1))
|
||||
elif ":" in line:
|
||||
k, v = line.split(":", 1)
|
||||
v = v.strip().rstrip(".")
|
||||
if v.isdigit():
|
||||
pages[k.strip()] = int(v)
|
||||
free_pages = (pages.get("Pages free", 0) + pages.get("Pages inactive", 0)
|
||||
+ pages.get("Pages speculative", 0))
|
||||
if free_pages:
|
||||
free_mb = (free_pages * page_size) // (1024 * 1024)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
return [{
|
||||
"index": 0,
|
||||
"name": name,
|
||||
"free_mb": free_mb,
|
||||
"total_mb": total_mb,
|
||||
"used_mb": max(0, total_mb - free_mb),
|
||||
"util_pct": 0,
|
||||
"uuid": "apple-metal-0",
|
||||
"unified_memory": True,
|
||||
"busy": (free_mb / total_mb) < 0.5 if total_mb else False,
|
||||
}]
|
||||
|
||||
|
||||
def cmd_gpus(args) -> None:
|
||||
"""Same shape the web UI gets — index/name/free_mb/total_mb/used_mb/
|
||||
util_pct/uuid. Returns `[]` with an `error` field if nvidia-smi is
|
||||
missing (laptop / CPU-only box). Pass `--host user@box` to run over
|
||||
SSH against a remote machine."""
|
||||
util_pct/uuid. On Apple Silicon (no nvidia-smi) reports the Metal GPU's
|
||||
unified memory instead. Returns `[]` with an `error` field only on a
|
||||
CPU-only non-Mac box. Pass `--host user@box` to run over SSH."""
|
||||
query = "nvidia-smi --query-gpu=index,name,memory.free,memory.total,memory.used,utilization.gpu,uuid --format=csv,noheader,nounits"
|
||||
prefix = _ssh_prefix(args.host, args.ssh_port)
|
||||
cmd = prefix + (query.split() if not prefix else [query])
|
||||
try:
|
||||
out = subprocess.run(cmd, capture_output=True, text=True, timeout=15)
|
||||
except FileNotFoundError:
|
||||
# No nvidia-smi locally → try the Metal fallback before giving up.
|
||||
if not prefix:
|
||||
mac = _macos_metal_gpu()
|
||||
if mac is not None:
|
||||
emit({"ok": True, "gpus": mac, "backend": "metal"}, args)
|
||||
return
|
||||
msg = "ssh not found" if prefix else "nvidia-smi not found"
|
||||
emit({"ok": False, "error": msg, "gpus": []}, args)
|
||||
return
|
||||
if out.returncode != 0:
|
||||
# nvidia-smi present but errored (or no NVIDIA GPU) — fall back to Metal.
|
||||
if not prefix:
|
||||
mac = _macos_metal_gpu()
|
||||
if mac is not None:
|
||||
emit({"ok": True, "gpus": mac, "backend": "metal"}, args)
|
||||
return
|
||||
emit({"ok": False, "error": out.stderr.strip()[:200], "gpus": []}, args)
|
||||
return
|
||||
gpus = []
|
||||
|
||||
@@ -19,12 +19,22 @@ GPU_BANDWIDTH = {
|
||||
"6950 xt": 576, "6900 xt": 512, "6800 xt": 512, "6800": 512, "6700 xt": 384, "6600 xt": 256, "6600": 224,
|
||||
"mi300x": 5300, "mi300": 5300, "mi250x": 3277, "mi250": 3277, "mi210": 1638, "mi100": 1229,
|
||||
"9070 xt": 624, "9070": 488,
|
||||
# Apple Silicon unified-memory bandwidth (GB/s). Keyed off the chip name
|
||||
# reported by sysctl machdep.cpu.brand_string (e.g. "Apple M4 Max"). Listed
|
||||
# before the bare "m_" keys matters less than length-sorting (done below),
|
||||
# which guarantees "m4 max" is tried before "m4".
|
||||
"m1 ultra": 800, "m1 max": 400, "m1 pro": 200, "m1": 68,
|
||||
"m2 ultra": 800, "m2 max": 400, "m2 pro": 200, "m2": 100,
|
||||
"m3 ultra": 800, "m3 max": 300, "m3 pro": 150, "m3": 100,
|
||||
"m4 max": 410, "m4 pro": 273, "m4": 120,
|
||||
}
|
||||
|
||||
# Pre-sort keys by length descending for correct substring matching
|
||||
_BW_KEYS_SORTED = sorted(GPU_BANDWIDTH.keys(), key=len, reverse=True)
|
||||
|
||||
FALLBACK_K = {"cuda": 220, "rocm": 180, "cpu_x86": 70, "cpu_arm": 90}
|
||||
# metal: backstop for Apple Silicon chips not in GPU_BANDWIDTH (e.g. a future
|
||||
# M5) — the named chips above take the accurate bandwidth path instead.
|
||||
FALLBACK_K = {"cuda": 220, "rocm": 180, "metal": 150, "cpu_x86": 70, "cpu_arm": 90}
|
||||
|
||||
USE_CASE_WEIGHTS = {
|
||||
"general": (0.45, 0.30, 0.15, 0.10),
|
||||
@@ -411,17 +421,28 @@ def rank_models(system, use_case=None, limit=50, search=None, sort="score", quan
|
||||
# If user picked a prequantized format (AWQ/FP8/GPTQ), filter to only those models
|
||||
filter_native = quant and any(quant.startswith(p) for p in ("AWQ-", "GPTQ-", "FP8"))
|
||||
|
||||
# MLX-quantized models only run on Apple Silicon (Metal). Exclude them on
|
||||
# every other backend (CUDA / ROCm / CPU) so Linux/Windows users don't see
|
||||
# unrunnable suggestions.
|
||||
system_backend = (system.get("backend") or "").lower()
|
||||
apple_silicon = system_backend in ("mps", "metal", "apple")
|
||||
|
||||
for m in models:
|
||||
native_q = m.get("quantization", "")
|
||||
|
||||
# Drop MLX models on non-Apple hardware
|
||||
if not apple_silicon and native_q.startswith("mlx-"):
|
||||
# MLX-quantized models need the MLX runtime (mlx_lm), which Odysseus
|
||||
# doesn't generate serve commands for — only llama.cpp/Ollama (Metal)
|
||||
# and vLLM/SGLang (CUDA). MLX repos ship no GGUF alternative, so they're
|
||||
# unrunnable on every backend we support. Always drop them, on Apple
|
||||
# Silicon too, so the Cookbook never recommends a model it can't serve.
|
||||
if native_q.startswith("mlx-"):
|
||||
continue
|
||||
|
||||
# On Apple Silicon the only serving engines are llama.cpp and Ollama,
|
||||
# both GGUF-only (vLLM/SGLang are CUDA/ROCm and don't run on macOS). So
|
||||
# a model is Metal-servable ONLY if it ships a real GGUF. Drop everything
|
||||
# else — raw safetensors repos (which the catalog still tags with a
|
||||
# default GGUF quant) and vLLM-only AWQ/GPTQ/FP8 builds alike. Without
|
||||
# this the Cookbook recommends models the Mac can't run; on CUDA these
|
||||
# stay visible because vLLM serves safetensors directly.
|
||||
if apple_silicon and not (m.get("is_gguf") or m.get("gguf_sources")):
|
||||
continue
|
||||
|
||||
# Format filter: AWQ tab → only AWQ models, FP8 tab → only FP8 models
|
||||
|
||||
@@ -204,6 +204,82 @@ def _detect_amd():
|
||||
return None
|
||||
|
||||
|
||||
def _detect_apple_silicon():
|
||||
"""Detect Apple Silicon (M-series) GPUs.
|
||||
|
||||
Macs have no discrete VRAM — the GPU shares the system's unified memory.
|
||||
We report a fraction of total RAM as the usable GPU budget (matching macOS's
|
||||
default Metal working-set limit) so the Cookbook recommends models that
|
||||
actually run on the GPU instead of classifying the machine as CPU-only.
|
||||
|
||||
backend="metal" is what services.hwfit.fit and the serve-command generation
|
||||
key off of (they already understand MLX / llama.cpp-Metal). Works locally
|
||||
(platform.system()=="Darwin") and over SSH (uname -s == Darwin).
|
||||
"""
|
||||
# Gate to macOS — locally via platform, remotely via uname.
|
||||
if _remote_host:
|
||||
if "darwin" not in (_run(["uname", "-s"]) or "").lower():
|
||||
return None
|
||||
arch = (_run(["uname", "-m"]) or "").lower()
|
||||
else:
|
||||
if platform.system() != "Darwin":
|
||||
return None
|
||||
arch = platform.machine().lower()
|
||||
|
||||
# Only Apple Silicon (arm64) has a Metal GPU worth serving LLMs on; Intel
|
||||
# Macs fall through to the CPU path.
|
||||
if "arm" not in arch and "aarch64" not in arch:
|
||||
return None
|
||||
|
||||
# Chip name, e.g. "Apple M4 Max" — carries the Pro/Max/Ultra variant that
|
||||
# the fit bandwidth table keys off of.
|
||||
brand = (_run(["sysctl", "-n", "machdep.cpu.brand_string"]) or "Apple Silicon").strip()
|
||||
|
||||
# Total unified memory in bytes.
|
||||
memsize = _run(["sysctl", "-n", "hw.memsize"])
|
||||
try:
|
||||
total_gb = int(memsize) / (1024**3) if memsize else 0.0
|
||||
except ValueError:
|
||||
total_gb = 0.0
|
||||
if total_gb <= 0:
|
||||
return None
|
||||
|
||||
# Usable GPU budget. macOS lets Metal use most of unified memory, but the
|
||||
# default working-set limit scales with RAM: small machines have to keep
|
||||
# more back for the OS + app. These fractions track Apple's
|
||||
# recommendedMaxWorkingSetSize defaults across the lineup. Honour an
|
||||
# explicit override if the user raised it with
|
||||
# `sudo sysctl iogpu.wired_limit_mb=…`.
|
||||
if total_gb <= 16:
|
||||
frac = 0.67
|
||||
elif total_gb <= 64:
|
||||
frac = 0.75
|
||||
else:
|
||||
frac = 0.80
|
||||
vram_gb = round(total_gb * frac, 1)
|
||||
wired = _run(["sysctl", "-n", "iogpu.wired_limit_mb"])
|
||||
try:
|
||||
wired_mb = int(wired) if wired else 0
|
||||
if wired_mb > 0:
|
||||
vram_gb = round(wired_mb / 1024.0, 1)
|
||||
except ValueError:
|
||||
pass
|
||||
|
||||
gpu = {"index": 0, "name": brand, "vram_gb": vram_gb}
|
||||
return {
|
||||
"gpu_name": brand,
|
||||
"gpu_vram_gb": vram_gb,
|
||||
"gpu_count": 1,
|
||||
"gpus": [gpu],
|
||||
"gpu_groups": _group_gpus([gpu]),
|
||||
"homogeneous": True,
|
||||
"backend": "metal",
|
||||
# Unified memory: the "VRAM" above is carved out of system RAM, not a
|
||||
# separate pool — downstream fit logic uses this to avoid double-budgeting.
|
||||
"unified_memory": True,
|
||||
}
|
||||
|
||||
|
||||
def _read_file(path):
|
||||
"""Read a file, locally or via SSH."""
|
||||
if _remote_host:
|
||||
@@ -246,6 +322,15 @@ def _get_ram_gb():
|
||||
return (pages * page_size) / (1024**3)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# macOS has no /proc/meminfo — fall back to sysctl (works locally and over
|
||||
# SSH to a remote Mac, where the sysconf path above isn't taken).
|
||||
memsize = _run(["sysctl", "-n", "hw.memsize"])
|
||||
if memsize:
|
||||
try:
|
||||
return int(memsize.strip()) / (1024**3)
|
||||
except ValueError:
|
||||
pass
|
||||
return 0.0
|
||||
|
||||
|
||||
@@ -263,6 +348,12 @@ def _get_cpu_name():
|
||||
if line.startswith("model name"):
|
||||
return line.split(":", 1)[1].strip()
|
||||
|
||||
# macOS has no /proc/cpuinfo — sysctl gives the chip name (e.g. "Apple M4").
|
||||
# Harmlessly returns nothing on Linux, so it's safe to try unconditionally.
|
||||
brand = _run(["sysctl", "-n", "machdep.cpu.brand_string"])
|
||||
if brand and brand.strip():
|
||||
return brand.strip()
|
||||
|
||||
if not _remote_host:
|
||||
return platform.processor() or "unknown"
|
||||
return "unknown"
|
||||
@@ -270,7 +361,8 @@ def _get_cpu_name():
|
||||
|
||||
def _get_cpu_count():
|
||||
if _remote_host:
|
||||
out = _run(["nproc"])
|
||||
# nproc on Linux; hw.ncpu via sysctl on a remote Mac (no nproc there).
|
||||
out = _run(["nproc"]) or _run(["sysctl", "-n", "hw.ncpu"])
|
||||
if out:
|
||||
try:
|
||||
return int(out.strip())
|
||||
@@ -411,7 +503,7 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
|
||||
cpu_cores = _get_cpu_count()
|
||||
cpu_name = _get_cpu_name()
|
||||
|
||||
gpu_info = _detect_nvidia() or _detect_amd()
|
||||
gpu_info = _detect_apple_silicon() or _detect_nvidia() or _detect_amd()
|
||||
|
||||
if gpu_info:
|
||||
result = {
|
||||
@@ -427,6 +519,9 @@ def detect_system(host="", ssh_port="", platform="", fresh=False):
|
||||
"gpu_groups": gpu_info.get("gpu_groups", []),
|
||||
"homogeneous": gpu_info.get("homogeneous", True),
|
||||
"backend": gpu_info["backend"],
|
||||
# Apple Silicon / AMD APUs share system RAM with the GPU — carry the
|
||||
# flag through so callers can tell unified from discrete VRAM.
|
||||
"unified_memory": gpu_info.get("unified_memory", False),
|
||||
}
|
||||
else:
|
||||
if _remote_host:
|
||||
|
||||
18
setup.py
18
setup.py
@@ -109,9 +109,12 @@ def check_deps():
|
||||
print("\n [warn] tmux not found")
|
||||
print(" Cookbook uses tmux for background downloads and model serves.")
|
||||
print(" Install it with your OS package manager, for example:")
|
||||
print(" sudo apt install tmux")
|
||||
print(" sudo pacman -S tmux")
|
||||
print(" sudo dnf install tmux")
|
||||
if sys.platform == "darwin":
|
||||
print(" brew install tmux")
|
||||
else:
|
||||
print(" sudo apt install tmux")
|
||||
print(" sudo pacman -S tmux")
|
||||
print(" sudo dnf install tmux")
|
||||
elif os.name != "nt":
|
||||
print(" [ok] tmux installed")
|
||||
|
||||
@@ -142,9 +145,12 @@ def main():
|
||||
print(f" [warn] Admin creation failed: {e}")
|
||||
|
||||
print("\n=== Setup complete ===")
|
||||
print(f"\nStart the server with:")
|
||||
print(f" python -m uvicorn app:app --host 0.0.0.0 --port 7000")
|
||||
print(f"\nThen open http://localhost:7000")
|
||||
# start-macos.sh launches the server itself (on its own port) right after
|
||||
# this, so suppress the manual hint there to avoid a contradictory URL.
|
||||
if not os.getenv("ODYSSEUS_SKIP_RUN_HINT"):
|
||||
print(f"\nStart the server with:")
|
||||
print(f" python -m uvicorn app:app --host 127.0.0.1 --port 7000")
|
||||
print(f"\nThen open http://localhost:7000")
|
||||
print(f"Login with the admin username and temporary password printed above.\n")
|
||||
|
||||
|
||||
|
||||
139
start-macos.sh
Executable file
139
start-macos.sh
Executable file
@@ -0,0 +1,139 @@
|
||||
#!/bin/bash
|
||||
# Odysseus — one-command quick start for macOS (Apple Silicon).
|
||||
#
|
||||
# ./start-macos.sh
|
||||
#
|
||||
# Installs everything Odysseus needs via Homebrew, sets up a local Python
|
||||
# environment, and launches the app — so a generic Mac user can run it without
|
||||
# knowing anything about venvs, pip, or uvicorn. Safe to re-run; it skips work
|
||||
# that's already done.
|
||||
#
|
||||
# Why native (not Docker): Cookbook serves models on whatever machine Odysseus
|
||||
# runs on, and Docker on macOS is a Linux VM with no access to the Metal GPU.
|
||||
# Running natively lets Cookbook detect and use your Mac's GPU.
|
||||
set -e
|
||||
|
||||
REPO_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$REPO_DIR"
|
||||
|
||||
PORT="${ODYSSEUS_PORT:-7860}" # 7860, not 7000 — macOS AirPlay Receiver holds 7000.
|
||||
|
||||
# Friendly message on any failure — re-running is safe (every step is idempotent).
|
||||
trap 'echo; echo "✗ Setup failed above. It is safe to re-run ./start-macos.sh."; exit 1' ERR
|
||||
|
||||
echo "▶ Odysseus quick start for macOS"
|
||||
|
||||
# Fail fast if the port is already taken (e.g. a previous run still running).
|
||||
if (exec 3<>"/dev/tcp/127.0.0.1/$PORT") 2>/dev/null; then
|
||||
echo "✗ Port $PORT is already in use. Stop what's using it, or pick another port:"
|
||||
echo " ODYSSEUS_PORT=7900 ./start-macos.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 1. Homebrew — the macOS package manager. We can't safely auto-install it
|
||||
# (it wants its own interactive confirmation), so point the user at it.
|
||||
if ! command -v brew >/dev/null 2>&1; then
|
||||
echo
|
||||
echo "Homebrew is required but not installed. Install it (one command), then re-run this script:"
|
||||
echo ' /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"'
|
||||
echo
|
||||
echo "More info: https://brew.sh"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 2. Find a Python 3.11+ to build the environment with.
|
||||
# On Apple Silicon we require an *arm64* interpreter (Homebrew's, under
|
||||
# /opt/homebrew). A universal2 or x86 Python — e.g. the python.org installer
|
||||
# at /usr/local — produces a venv whose compiled extensions get loaded as the
|
||||
# wrong architecture when launched from the .app bundle (Cookbook then dies
|
||||
# with "incompatible architecture"). So on arm64 we only look under
|
||||
# /opt/homebrew and install Homebrew's python@3.11 if it's missing. On Intel
|
||||
# (or non-mac) we just use whatever Python 3.11+ is on PATH.
|
||||
PY=""
|
||||
if [ "$(uname -m)" = "arm64" ]; then
|
||||
cands="/opt/homebrew/bin/python3.13 /opt/homebrew/bin/python3.12 /opt/homebrew/bin/python3.11"
|
||||
else
|
||||
cands="python3 python3.13 python3.12 python3.11"
|
||||
fi
|
||||
for cand in $cands; do
|
||||
p="$(command -v "$cand" 2>/dev/null)" || continue
|
||||
if "$p" -c 'import sys; raise SystemExit(0 if sys.version_info[:2] >= (3, 11) else 1)' 2>/dev/null; then
|
||||
PY="$p"; break
|
||||
fi
|
||||
done
|
||||
|
||||
# System dependencies:
|
||||
# - tmux : Cookbook runs model downloads/serves in the background
|
||||
# - llama.cpp : a prebuilt, Metal-enabled llama-server so Cookbook can serve
|
||||
# GGUF models on the GPU with no compile step
|
||||
# - python@3.11 : installed only if no suitable (arm64) Python was found above
|
||||
echo "▶ Installing dependencies (Homebrew)…"
|
||||
if [ -n "$PY" ]; then
|
||||
echo " (using $("$PY" --version 2>&1) at $PY)"
|
||||
brew install tmux llama.cpp
|
||||
else
|
||||
brew install python@3.11 tmux llama.cpp
|
||||
PY="$(command -v /opt/homebrew/bin/python3.11 || command -v python3.11 || true)"
|
||||
fi
|
||||
|
||||
if [ -z "$PY" ] || [ ! -x "$PY" ]; then
|
||||
echo "✗ Couldn't find a Python 3.11+ to build the environment with."
|
||||
echo " Check: ls /opt/homebrew/bin/python3* (or install one: brew install python@3.11)"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 3. Python environment + dependencies (kept inside the repo, in venv/).
|
||||
# Named `venv` to match the manual steps and build-macos-app.sh, so the
|
||||
# clickable .app reuses this same environment.
|
||||
if [ ! -d venv ]; then
|
||||
echo "▶ Creating Python environment…"
|
||||
"$PY" -m venv venv
|
||||
fi
|
||||
echo "▶ Installing Python packages (first run downloads a few — can take a few minutes)…"
|
||||
./venv/bin/python -m pip install --quiet --upgrade pip
|
||||
# Not --quiet: this is the slow step, so show progress (and any real errors).
|
||||
./venv/bin/python -m pip install -r requirements.txt
|
||||
|
||||
# 4. First-run setup: creates data dirs and prints an initial admin password
|
||||
# the first time (idempotent — does nothing if already set up). Suppress its
|
||||
# manual run hint — we launch the server ourselves just below.
|
||||
echo "▶ Preparing Odysseus…"
|
||||
ODYSSEUS_SKIP_RUN_HINT=1 ./venv/bin/python setup.py
|
||||
|
||||
# 5. Launch. Bind to loopback only (safe default).
|
||||
URL="http://127.0.0.1:$PORT"
|
||||
|
||||
# Open the browser automatically once the server is accepting connections — so
|
||||
# the URL isn't lost in the startup logs that keep scrolling. Runs in the
|
||||
# background and is cleaned up when the server stops. Skip with
|
||||
# ODYSSEUS_NO_OPEN=1 (e.g. over SSH / headless).
|
||||
POLLER_PID=""
|
||||
if [ -z "$ODYSSEUS_NO_OPEN" ] && command -v open >/dev/null 2>&1; then
|
||||
(
|
||||
for _ in $(seq 1 90); do
|
||||
if (exec 3<>"/dev/tcp/127.0.0.1/$PORT") 2>/dev/null; then
|
||||
printf '\n'
|
||||
printf ' ┌────────────────────────────────────────────┐\n'
|
||||
printf ' │ ✓ Odysseus is ready — opening your browser │\n'
|
||||
printf ' │ %-40s │\n' "$URL"
|
||||
printf ' │ (Press Ctrl+C in this window to stop) │\n'
|
||||
printf ' └────────────────────────────────────────────┘\n\n'
|
||||
open "$URL"
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
done
|
||||
) &
|
||||
POLLER_PID=$!
|
||||
fi
|
||||
|
||||
# Setup is done — drop the setup-failure handler, and clean up the background
|
||||
# opener when the server exits or the user presses Ctrl+C.
|
||||
trap - ERR
|
||||
trap '[ -n "$POLLER_PID" ] && kill "$POLLER_PID" 2>/dev/null' EXIT INT TERM
|
||||
|
||||
echo
|
||||
echo "▶ Starting Odysseus — it will open in your browser at $URL"
|
||||
echo " (this takes a few seconds; press Ctrl+C here to stop)"
|
||||
echo
|
||||
./venv/bin/python -m uvicorn app:app --host 127.0.0.1 --port "$PORT"
|
||||
@@ -171,6 +171,13 @@ export function _isWindows(hostOrTask) {
|
||||
return _getPlatform(hostOrTask) === 'windows';
|
||||
}
|
||||
|
||||
/** Check if the detected (local) hardware is Apple Silicon / Metal. Keys off the
|
||||
* hardware probe's backend rather than a platform string, since a local Mac
|
||||
* reports no platform but does report backend: "metal". */
|
||||
export function _isMetal() {
|
||||
return ['metal', 'mps', 'apple'].includes(String(_hwfitCache?.system?.backend || '').toLowerCase());
|
||||
}
|
||||
|
||||
/** Detect model-specific vLLM optimizations */
|
||||
function _detectModelOptimizations(modelName) {
|
||||
const n = (modelName || '').toLowerCase();
|
||||
@@ -252,6 +259,13 @@ export function _detectBackend(model) {
|
||||
return { backend: 'llamacpp', label: 'llama.cpp' };
|
||||
}
|
||||
|
||||
// Apple Silicon (Metal) → llama.cpp (GGUF). vLLM/SGLang are CUDA/ROCm-only and
|
||||
// don't run on macOS; AWQ/GPTQ/FP8 (vLLM-only) models are already filtered out
|
||||
// of metal Cookbook results, so llama.cpp is always the right engine here.
|
||||
if (['metal', 'mps', 'apple'].includes(sysBackend)) {
|
||||
return { backend: 'llamacpp', label: 'llama.cpp' };
|
||||
}
|
||||
|
||||
// AWQ / GPTQ / FP8 → vLLM
|
||||
if (/^AWQ|^GPTQ/.test(q) || q === 'FP8') {
|
||||
return { backend: 'vllm', label: 'vLLM' };
|
||||
@@ -1764,6 +1778,7 @@ const shared = {
|
||||
_sshPrefix,
|
||||
_getPlatform,
|
||||
_isWindows,
|
||||
_isMetal,
|
||||
_buildEnvPrefix,
|
||||
_buildServeCmd,
|
||||
_shellQuote,
|
||||
|
||||
@@ -16,6 +16,7 @@ let _getPort;
|
||||
let _sshPrefix;
|
||||
let _getPlatform;
|
||||
let _isWindows;
|
||||
let _isMetal;
|
||||
let _buildEnvPrefix;
|
||||
let _buildServeCmd;
|
||||
let _shellQuote;
|
||||
@@ -382,6 +383,9 @@ function _rerenderCachedModels() {
|
||||
panelHtml += `<div class="hwfit-serve-row">`;
|
||||
const _backendChoices = _isWindows()
|
||||
? [['llamacpp','llama.cpp']]
|
||||
: _isMetal()
|
||||
// Diffusers (diffusion_server.py) is CUDA-only — omit it on Metal.
|
||||
? [['llamacpp','llama.cpp'],['ollama','Ollama']]
|
||||
: [['vllm','vLLM'],['sglang','SGLang'],['llamacpp','llama.cpp'],['diffusers','Diffusers']];
|
||||
const backendOpts = _backendChoices.map(([v,l]) => `<option value="${v}"${defaultBackend===v?' selected':''}>${l}</option>`).join('');
|
||||
panelHtml += `<label>${_l('Backend','Inference engine: vLLM, SGLang, llama.cpp, or Diffusers')}<select class="hwfit-sf" data-field="backend">${backendOpts}</select></label>`;
|
||||
@@ -1592,6 +1596,7 @@ export function initServe(shared) {
|
||||
_sshPrefix = shared._sshPrefix;
|
||||
_getPlatform = shared._getPlatform;
|
||||
_isWindows = shared._isWindows;
|
||||
_isMetal = shared._isMetal;
|
||||
_buildEnvPrefix = shared._buildEnvPrefix;
|
||||
_buildServeCmd = shared._buildServeCmd;
|
||||
_shellQuote = shared._shellQuote;
|
||||
|
||||
@@ -1,7 +1,12 @@
|
||||
import pytest
|
||||
from fastapi import HTTPException
|
||||
|
||||
from routes.cookbook_helpers import _safe_env_prefix, _validate_gpus, _validate_ssh_port
|
||||
from routes.cookbook_helpers import (
|
||||
_local_tooling_path_export,
|
||||
_safe_env_prefix,
|
||||
_validate_gpus,
|
||||
_validate_ssh_port,
|
||||
)
|
||||
|
||||
|
||||
def test_safe_env_prefix_accepts_quoted_venv_path():
|
||||
@@ -38,3 +43,18 @@ def test_validate_gpus_accepts_indexes_only():
|
||||
assert _validate_gpus("0,1,2") == "0,1,2"
|
||||
with pytest.raises(HTTPException):
|
||||
_validate_gpus("0; rm -rf /")
|
||||
|
||||
|
||||
def test_local_tooling_path_export_prepends_interpreter_bin():
|
||||
"""The cookbook runners must see the venv's bin (where `hf`/`python` live)
|
||||
so tmux shells can find them without an activated venv."""
|
||||
assert (
|
||||
_local_tooling_path_export("/opt/venv/bin/python")
|
||||
== 'export PATH="/opt/venv/bin:$PATH"'
|
||||
)
|
||||
|
||||
|
||||
def test_local_tooling_path_export_preserves_spaces_and_expands_path():
|
||||
line = _local_tooling_path_export("/Users/John Smith/.venv/bin/python3")
|
||||
assert line == 'export PATH="/Users/John Smith/.venv/bin:$PATH"'
|
||||
assert line.endswith(':$PATH"') # $PATH stays expandable in double quotes
|
||||
|
||||
129
tests/test_hwfit_macos.py
Normal file
129
tests/test_hwfit_macos.py
Normal file
@@ -0,0 +1,129 @@
|
||||
"""macOS / Apple Silicon (Metal) support for Cookbook hardware-fit.
|
||||
|
||||
Covers the Metal-specific behavior added for Apple Silicon and locks in the
|
||||
guarantee that non-macOS (Linux/Windows) detection is unchanged.
|
||||
"""
|
||||
|
||||
from services.hwfit import hardware
|
||||
from services.hwfit.fit import rank_models
|
||||
from services.hwfit.models import get_models
|
||||
|
||||
|
||||
def _metal_system(ram_gb=16.0, vram_gb=10.7):
|
||||
return {
|
||||
"has_gpu": True,
|
||||
"backend": "metal",
|
||||
"gpu_name": "Apple M2",
|
||||
"gpu_vram_gb": vram_gb,
|
||||
"gpu_count": 1,
|
||||
"available_ram_gb": ram_gb * 0.7,
|
||||
"total_ram_gb": ram_gb,
|
||||
"unified_memory": True,
|
||||
}
|
||||
|
||||
|
||||
def _fake_sysctl(brand="Apple M2 Pro", memsize_gb=32, wired_mb=None):
|
||||
def run(cmd):
|
||||
joined = " ".join(cmd)
|
||||
if "machdep.cpu.brand_string" in joined:
|
||||
return brand
|
||||
if "hw.memsize" in joined:
|
||||
return str(int(memsize_gb * 1024**3))
|
||||
if "iogpu.wired_limit_mb" in joined:
|
||||
return str(wired_mb) if wired_mb is not None else None
|
||||
return None
|
||||
return run
|
||||
|
||||
|
||||
def test_mlx_models_hidden_on_metal():
|
||||
"""MLX-quantized models can't be served by llama.cpp or Ollama (the only
|
||||
Metal-capable engines Odysseus generates), so they must never be recommended
|
||||
on Apple Silicon — even though the catalog tags them as Apple-only."""
|
||||
results = rank_models(_metal_system(), limit=900)
|
||||
mlx = [m for m in results if str(m.get("quant", "")).startswith("mlx-")]
|
||||
assert mlx == [], f"MLX models surfaced but cannot be served: {[m['name'] for m in mlx]}"
|
||||
|
||||
|
||||
def _cuda_system():
|
||||
return {
|
||||
"has_gpu": True, "backend": "cuda", "gpu_name": "NVIDIA RTX 4090",
|
||||
"gpu_vram_gb": 24.0, "gpu_count": 1, "available_ram_gb": 32.0, "total_ram_gb": 64.0,
|
||||
}
|
||||
|
||||
|
||||
def test_mlx_hidden_on_cuda_backend_unchanged():
|
||||
"""Regression guard: Linux/CUDA users never saw MLX before and still don't."""
|
||||
mlx = [m for m in rank_models(_cuda_system(), limit=900) if str(m.get("quant", "")).startswith("mlx-")]
|
||||
assert mlx == []
|
||||
|
||||
|
||||
def test_only_gguf_models_recommended_on_metal():
|
||||
"""llama.cpp and Ollama (the only Metal engines) need GGUF. Safetensors-only
|
||||
repos — incl. vLLM-only AWQ/GPTQ/FP8 — can't be served on Metal, so every
|
||||
model recommended on Apple Silicon must ship a servable GGUF."""
|
||||
catalog = {m["name"]: m for m in get_models()}
|
||||
unservable = [
|
||||
r["name"] for r in rank_models(_metal_system(), limit=900)
|
||||
if not (catalog.get(r["name"], {}).get("is_gguf")
|
||||
or catalog.get(r["name"], {}).get("gguf_sources"))
|
||||
]
|
||||
assert unservable == [], f"{len(unservable)} non-GGUF models on Metal, e.g. {unservable[:3]}"
|
||||
|
||||
|
||||
def test_safetensors_models_still_recommended_on_cuda():
|
||||
"""Regression guard: vLLM serves safetensors on CUDA, so non-GGUF repos must
|
||||
NOT be filtered there — the GGUF-only rule is Metal-specific."""
|
||||
names = {r["name"] for r in rank_models(_cuda_system(), limit=900)}
|
||||
assert "microsoft/Phi-mini-MoE-instruct" in names
|
||||
|
||||
|
||||
def test_apple_silicon_detected_as_metal(monkeypatch):
|
||||
"""On local Apple Silicon, detection reports a Metal GPU with a RAM-scaled
|
||||
unified-memory budget."""
|
||||
monkeypatch.setattr(hardware, "_remote_host", None)
|
||||
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
|
||||
monkeypatch.setattr(hardware.platform, "machine", lambda: "arm64")
|
||||
monkeypatch.setattr(hardware, "_run", _fake_sysctl(memsize_gb=32))
|
||||
|
||||
info = hardware._detect_apple_silicon()
|
||||
assert info is not None
|
||||
assert info["backend"] == "metal"
|
||||
assert info["gpu_name"] == "Apple M2 Pro"
|
||||
assert info["unified_memory"] is True
|
||||
assert info["gpu_vram_gb"] == 24.0 # 32GB * 0.75
|
||||
|
||||
|
||||
def test_apple_silicon_skipped_on_linux(monkeypatch):
|
||||
"""Guarantee Linux detection is untouched: the Metal probe bails immediately."""
|
||||
monkeypatch.setattr(hardware, "_remote_host", None)
|
||||
monkeypatch.setattr(hardware.platform, "system", lambda: "Linux")
|
||||
monkeypatch.setattr(hardware.platform, "machine", lambda: "x86_64")
|
||||
monkeypatch.setattr(hardware, "_run", _fake_sysctl())
|
||||
assert hardware._detect_apple_silicon() is None
|
||||
|
||||
|
||||
def test_intel_mac_skipped(monkeypatch):
|
||||
"""Intel Macs have no Metal GPU worth serving LLMs on — fall through to CPU."""
|
||||
monkeypatch.setattr(hardware, "_remote_host", None)
|
||||
monkeypatch.setattr(hardware.platform, "system", lambda: "Darwin")
|
||||
monkeypatch.setattr(hardware.platform, "machine", lambda: "x86_64")
|
||||
monkeypatch.setattr(hardware, "_run", _fake_sysctl())
|
||||
assert hardware._detect_apple_silicon() is None
|
||||
|
||||
|
||||
def test_detect_system_propagates_unified_memory(monkeypatch):
|
||||
"""The unified_memory flag set by GPU detection must survive into the
|
||||
system dict so the API and UI can report it (it was being dropped)."""
|
||||
monkeypatch.setattr(hardware, "_detect_apple_silicon", lambda: {
|
||||
"gpu_name": "Apple M4", "gpu_vram_gb": 10.7, "gpu_count": 1,
|
||||
"gpus": [], "gpu_groups": [], "homogeneous": True,
|
||||
"backend": "metal", "unified_memory": True,
|
||||
})
|
||||
monkeypatch.setattr(hardware, "_get_ram_gb", lambda: 16.0)
|
||||
monkeypatch.setattr(hardware, "_get_available_ram_gb", lambda: 11.0)
|
||||
monkeypatch.setattr(hardware, "_get_cpu_count", lambda: 10)
|
||||
monkeypatch.setattr(hardware, "_get_cpu_name", lambda: "Apple M4")
|
||||
|
||||
s = hardware.detect_system(fresh=True)
|
||||
assert s["backend"] == "metal"
|
||||
assert s.get("unified_memory") is True
|
||||
Reference in New Issue
Block a user