Commit Graph

12 Commits

Author SHA1 Message Date
pewdiepie-archdaemon
562bc4dedc Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
  ascending then truncated top-N, which made "highest VRAM" mathematically
  unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
  re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
  default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
  Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
  with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
  actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
  QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.

Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
  re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
  later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
  tooltip. Smart title-suffix strips the parts already in the repo name
  (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
  (#1962) for model-add requests.

Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
  "Download complete"). True overall % for multi-shard downloads:
  ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
  live shard line, the pill is hidden + relabels as "reconnect" + revives
  on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
  to 8s), scan persisted done/error/crashed downloads and probe their
  tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
  state is done but tmux is still alive revives the existing task and
  refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
  /api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
  DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
  shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
  status got stuck. Dir text on the running row, font matched to uptime.

Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
  metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
  matches body; action buttons get icons on the left (Retry/Copy/Edit/
  Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
  downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
2026-06-03 20:25:25 +09:00
pewdiepie-archdaemon
3706d756f3 Merge remote-tracking branch 'origin/main' into visual-pr-playground
# Conflicts:
#	routes/cookbook_routes.py
#	routes/hwfit_routes.py
#	services/hwfit/fit.py
#	services/hwfit/models.py
#	static/js/cookbook-diagnosis.js
#	static/js/cookbook-hwfit.js
#	static/js/cookbook.js
#	static/js/cookbookRunning.js
2026-06-03 16:49:10 +09:00
pewdiepie-archdaemon
eb79b76432 Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes
Backend (services/hwfit + routes):
- rank_models picks visible set by REQUESTED column, not always score —
  sorting by Param now shows highest-param models PERIOD (incl. too_tight).
- New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang
  cannot serve them); default non-prequantized to BF16 on 2+ GPUs.
- AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so
  FP8 wins when both fit.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above
  M2.5 on equal composite score; >=100B integers not misread as versions.
- /api/cookbook/hf-latest no longer drops models without an "NB" pattern in
  the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered).
- Cached-model scan: atexit flushes models JSON even if the script is
  killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s.
- KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New
  "stalled" status for shells <1 MB with no .incomplete files.
- /api/cookbook/state POST guard: rejects "done" download tasks lacking
  DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
  shard is N<total — stops stale tabs from poisoning persisted state.
- hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization
  Q4_K_M -> BF16 (it is the native base, not a quant).

Frontend (static/js):
- Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/
  50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score
  on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit
  column header tagged with active budget (RAM / GPU / N GPU).
- Foldable Download admin-card: the Download h2 is the chevron trigger;
  state persists in localStorage.
- Download card surfaces destination dir (Dir: <path>). Same dir on running
  task row, font/color matched to uptime (9px Fira Code muted, opacity .4).
- Serve panel ctx text input always resets to model max on open. Sub-MB
  cached models show with red "download stalled" badge.
- Bulk-select Cancel + Delete reset the Select button label on exit.
- Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/
  required; bare "Download complete" no longer marks the task done after
  the first config file. Clear button now sends tmux kill-session too.
  True overall % for multi-shard downloads: ((N-1)+frac)/total instead of
  hf_transfer per-shard aggregate.
- Diagnosis card simplified: removed fold toggle, copy button, dismiss X.
  Suggestion font matches message body (12px).
- HF token field flashes green check + "Saved" on save.
- Cached scan no longer counts stalled rows as downloaded in Scan/Download.

CSS:
- dep Install button width pinned to 76px to match Installed split.
- task-sub row +1px; task-status badge gets margin-right 8px.
- Ctx slider styled like gallery editor sliders (thin pill rail, red thumb).
- Bulk-select cancel button top -3px -> -5px.
2026-06-03 16:32:20 +09:00
Afonso Coutinho
3d00c85636 fix: hwfit native quant labels miss the cost maps and over-estimate VRAM (#1690) 2026-06-03 14:22:42 +09:00
Afonso Coutinho
04f8aa1833 fix: _lookup_bandwidth crashes on a truthy non-string gpu_name (#1641) 2026-06-03 14:11:10 +09:00
pewdiepie-archdaemon
ff93a6c63b Polish email and cookbook flows 2026-06-02 22:42:07 +09:00
Leo
de92bbe47a Cookbook fit: steer consumer AMD to GGUF recommendations
* Cookbook fit: consumer-AMD GGUF recommendations + accurate estimates (core logic)

Split of #746 — the estimate/ranking MATH only, so it can be reviewed with tests
first (UI changes follow separately). Backend files only: no static/js here.

services/hwfit/fit.py, services/hwfit/hardware.py:
- Recommend GGUF/llama.cpp on consumer AMD (RDNA, gfx10/11/12) instead of
  formats that don't run on consumer Radeon — vLLM-only AWQ/GPTQ/FP8 AND
  vendor-specific NVFP4 (NVIDIA) / MLX (Apple). Datacenter Instinct (CDNA) and
  CUDA are left untouched.
- More accurate speed estimates across more GPUs (adds RDNA bandwidth data).
- Detect AMD/RDNA GPUs (gpu_family from rocminfo) so fit/serve can branch on it.

tests/test_hwfit_amd.py: AMD recommendation path, quant/bit matching, estimate
realism, gfx RDNA-vs-CDNA classification.

Rebased onto current main (analyze_model gained a scoring_use_case param there;
kept it). Vision detection intentionally NOT added here — main already ships a
"Vision" type filter + multimodal use-case handling; duplicating it was dropped.

Checks: py_compile clean; pytest tests/test_hwfit_amd.py + hwfit/serve suites
= 28 passed; full suite 0 new failures vs main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Tests: assert NVFP4/MLX/FP8 formats are filtered on consumer RDNA

Backs the #972 claim with an explicit regression: no NVIDIA NVFP4, Apple MLX,
or vLLM-only FP8/AWQ/GPTQ repos are recommended on a consumer Radeon, and guards
against vacuity by asserting such repos exist in the catalog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:01:42 +09:00
spooky
cd4f496cb4 Fix native Cookbook quant classification 2026-06-02 13:07:20 +09:00
pewdiepie-archdaemon
966b53df77 Improve Cookbook serve diagnostics and recommendations 2026-06-02 12:15:47 +09:00
elijaheck
c303a29670 Fix native macOS tailnet launch and Metal GPU probe (#756)
* macOS/Apple Silicon: detect Metal backend, surface MLX models, brew tmux hint

- hardware.py: add _detect_macos() via sysctl/system_profiler; report
  backend=metal + unified_memory on Apple Silicon instead of cpu_arm
- fit.py: add Apple Silicon (M1-M5) unified-memory bandwidths + metal
  FALLBACK_K so throughput estimates use the real bandwidth formula
- setup.py: Mac-specific 'brew install tmux' hint

Verified on M5 Pro 48GB: backend=metal, 273GB/s matched, 6 MLX models now
visible (were hidden), cuda still hides MLX, no new test failures.

* Fix native macOS tailnet launch and Metal GPU probe

---------

Co-authored-by: Elijah (Hermes) <hermes@local>
2026-06-02 11:41:04 +09:00
John Chaplin
f1817fd560 Add macOS Apple Silicon Cookbook support
* Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning

hardware.py detects Apple Silicon locally and over SSH, reporting
backend=metal, the chip name, and a RAM-scaled fraction of unified
memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth
table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8)
that can't be served on Metal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 32ac81dbc680361463a088dae867d555d5a79c3b)

* Generate macOS/Metal serve commands and surface the Metal GPU

cookbook_routes.py adds a macOS serve path (Ollama, Metal-aware
llama.cpp build using `sysctl hw.ncpu` instead of `nproc`, and a clear
error if vLLM is attempted). The frontend defaults Metal serving to
llama.cpp and offers llama.cpp/Ollama instead of vLLM/SGLang. The
odysseus-cookbook CLI's `gpus` command reports the Metal GPU via
sysctl/vm_stat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 4ba01ce25d256ae032029898f361c824a34fcd4b)

* Add launchd LaunchAgent for macOS (systemd equivalent)

com.odysseus.ui.plist + install-service-macos.sh run Odysseus at login
and restart on crash, the macOS counterpart to odysseus-ui.service. The
installer auto-fills paths from the venv, so there's no hand-editing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 3d4b6b2c7b8b31af32201ed278115df9a559dea9)

* Document macOS install (brew, Ollama, AirPlay port, launchd)

README + setup.py cover the Homebrew / Apple Silicon path: brew install
python@3.11 tmux ollama, Metal serving via Ollama/llama.cpp, the launchd
service, and the macOS AirPlay Receiver conflict on ports 7000/5000.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 8dc9a3578a1726f070ed9f75c0958ae291a6d966)

* Add downloadable macOS launcher app builder

build-macos-app.sh generates dist/Odysseus.app and a drag-to-Applications
dist/Odysseus.dmg. The app starts the local server from this repo's venv and
opens the UI in a chrome-less app window (Chromium --app mode, falling back to
the default browser). It's a launcher wrapper — it drives the venv rather than
bundling Python — so the install path is baked in at build time.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 7927940c3810ee34640803b198d334a6ac93474d)

* Harden macOS Cookbook support: hide MLX, fix Metal build cache

Builds on the adopted PR #213 macOS/Metal work with two fixes and tests:

- fit.py: always drop MLX-quantized models. Odysseus only generates serve
  commands for llama.cpp/Ollama (Metal) and vLLM/SGLang (CUDA); MLX needs the
  mlx_lm runtime and the catalog's MLX repos ship no GGUF alternative, so they
  were surfaced on Apple Silicon but could never be served.
- cookbook_routes.py (macOS branch only): `rm -rf build` before configure so a
  poisoned CMakeCache from a prior failed CUDA attempt can't make every later
  build fail; explicit -DCMAKE_BUILD_TYPE=Release; a clear "brew install cmake"
  hint if cmake is missing. Linux/CUDA path unchanged.
- tests/test_hwfit_macos.py: MLX hidden on metal, MLX still hidden on CUDA
  (regression guard), Metal detection on Apple Silicon, and skipped on
  Linux/Intel (proves non-macOS detection is untouched).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Propagate unified_memory flag and document macOS GPU/Docker caveat

- hardware.py: detect_system now carries the unified_memory flag from GPU
  detection into the system dict (it was set by _detect_apple_silicon / AMD-APU
  detection but dropped during result assembly, so the API always reported
  null). Lets callers distinguish unified from discrete VRAM.
- README: prominent warning that Docker on Apple Silicon can't reach the Metal
  GPU (runs a Linux VM) — Cookbook must run natively for GPU serving; fix stale
  text that said Cookbook recommends MLX models (now hidden as unservable).
- test: detect_system propagates unified_memory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Put Odysseus's venv bin on PATH for cookbook runners

Native (non-Docker) installs run from a virtualenv whose bin holds the `hf` CLI
and `python3` the cookbook download/serve tmux scripts shell out to. Those
scripts start in a fresh login shell with the venv NOT activated, so on a native
macOS install `hf download` failed with "hf: command not found" — and the
`pip --user` self-heal missed because macOS has no bare `pip` command.

- cookbook_helpers.py: _local_tooling_path_export() — pure helper returning a
  PATH export for the running interpreter's bin dir (escaped for double quotes).
- cookbook_routes.py: download + serve runners prepend that dir on local runs
  (gated off SSH/Windows); swap the `pip` install fallbacks to `python3 -m pip`.
- tests: helper output for normal and spaced paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Document macOS llama.cpp serving prerequisites

Clarify the two serving paths on Apple Silicon: the recommended zero-build
route (brew install llama.cpp ships a Metal llama-server Cookbook finds on PATH),
and the from-source fallback, which requires cmake + Xcode Command Line Tools.
Without those the build is skipped and serving silently degrades to a slow CPU
build, so new users now know to install them (or use the prebuilt) up front.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Recommend only GGUF-servable models on Metal

Apple Silicon's only serving engines are llama.cpp and Ollama, both GGUF-only
(vLLM/SGLang are CUDA/ROCm and don't run on macOS). The catalog tags raw
safetensors repos with a default Q4_K_M quant, so the fit-ranking was
recommending ~397/501 models that have no GGUF and fail to serve on Metal with
"No GGUF found" (e.g. microsoft/Phi-mini-MoE-instruct).

Drop any model without a real GGUF (is_gguf/gguf_sources) on Apple Silicon —
subsumes the previous AWQ/GPTQ/FP8 special-case into one rule. On CUDA these
stay visible since vLLM serves safetensors directly. Metal recommendations go
501 -> 104, all actually servable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Remove macOS launchd LaunchAgent (cherry-picked extra)

Drop the launchd service from the PR #213 cherry-picks: the
install-service-macos.sh installer, the com.odysseus.ui.plist template, and the
README section documenting them. Tangential to the core Cookbook/Metal support
and not wanted. The build-macos-app.sh launcher is kept.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Add one-command macOS quick start (start-macos.sh)

Running Odysseus natively on a Mac previously meant ~7 manual terminal steps
(brew deps, venv, activate, pip, setup.py, uvicorn with the right port) — not
friendly for a generic macOS user, and the native run is required because Docker
on macOS can't reach the Metal GPU.

- start-macos.sh: installs Homebrew deps (python@3.11, tmux, prebuilt Metal
  llama.cpp), creates the venv, installs requirements, runs setup, and launches
  on a non-AirPlay port (7860). Idempotent; re-run to start again.
- README: the Apple Silicon section now leads with this one-command quick start
  and the clickable .app, with engine/port/manual details folded into a
  collapsible block. Added a pointer at the top of the manual-install section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* macOS quick start: auto-open browser when ready

The "open this URL" line scrolled out of view as uvicorn kept logging after it,
so users missed it. Now start-macos.sh waits (in the background) until the
server accepts connections, prints a boxed "ready" banner at that point (i.e.
after the startup burst, not before), and opens the URL in the default browser
automatically. Skippable with ODYSSEUS_NO_OPEN=1 for headless/SSH use.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Don't assume/force a specific Python version on macOS

The README claimed "system Python is 3.9" — a machine-specific generalization
that's often wrong (macOS ships no recent Python by default; many users already
have 3.11+). Make it generic, and make start-macos.sh detect an existing
Python 3.11+ and use it, only installing python@3.11 when none is found instead
of forcing it on top of the user's Python.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Align start-macos.sh venv path with build-macos-app.sh

start-macos.sh created the environment in .venv/, but build-macos-app.sh and
the manual install steps use venv/ — so the clickable .app wouldn't reuse the
quick-start's environment and would rebuild a second one. Use venv/ everywhere.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* README: state clearly that MLX is unsupported on Apple Silicon

Odysseus has no mlx_lm runtime; it serves GGUF (llama.cpp/Ollama) and CUDA
(vLLM/SGLang) only. MLX-only models can't run on a Mac and are hidden from
Cookbook — make that explicit in both the quick start and the details.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* start-macos.sh: build the venv with an arm64 Python on Apple Silicon

A clean-room run surfaced this: with a universal2/x86 Python (e.g. the
python.org installer under /usr/local), the venv's compiled extensions install
as arm64 but get loaded as x86_64 when launched from the .app bundle, so it
crashes with "incompatible architecture (have arm64, need x86_64)". The terminal
run happened to work only because a universal binary defaults to arm64 there.

On Apple Silicon, look only under /opt/homebrew (arm64-only) for the build
Python, and install Homebrew's python@3.11 if none is present — so the venv is
arm64-only and launches correctly from both the terminal and the .app. Intel
and non-mac paths are unchanged. Verified end-to-end in a clean clone: .app now
boots on Metal with no arch error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Address dev-exp review: macOS setup robustness + doc/UX fixes

From the voltagent dev-exp review of the branch:
- README: fix broken anchor links (the em-dash heading produced a slug the links
  didn't match); simplify the heading to a stable slug.
- cookbook_routes.py: add /opt/homebrew/bin and /usr/local/bin to the serve PATH
  so a brew-installed llama-server/ollama is found instead of falling back to a
  slow source build.
- start-macos.sh: guard against an empty Python path; fail fast with a clear
  message on port-in-use; ERR trap with a "safe to re-run" message; show pip
  progress (drop --quiet on the slow requirements install); stop the background
  browser-opener cleanly on exit/Ctrl+C (no orphaned poller).
- setup.py: bind hint to 127.0.0.1; suppress the manual run-hint when launched
  by start-macos.sh (ODYSSEUS_SKIP_RUN_HINT) so the URL isn't contradictory.
- build-macos-app.sh: the .app only opens the browser once the server is
  actually ready (not after the readiness timeout).
- cookbookServe.js: drop "Diffusers" from the Metal backend picker —
  diffusion_server.py is CUDA-only, so it was an unservable option on macOS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: yunggilja <yunggilja@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 14:59:19 +09:00
pewdiepie-archdaemon
e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00