odysseus

Author	SHA1	Message	Date
Afonso Coutinho	c5bc39de88	fix: _extract_entities crashes on a non-string query (#1724 )	2026-06-03 13:30:28 +09:00
Afonso Coutinho	0c37943267	fix: search service crashes on a non-dict result row (#1725 )	2026-06-03 13:30:19 +09:00
Afonso Coutinho	6e38d3f2ef	fix: youtube (services) comment formatter crashes on a non-dict comment (#1746 )	2026-06-03 13:29:01 +09:00
Afonso Coutinho	d2f6e8068d	fix: is_youtube_url (services) crashes on a non-string url (#1753 )	2026-06-03 13:24:24 +09:00
Ethan	33bf975597	Stop GET /api/search/config from leaking the Brave API key (#1661 ) (#1750 ) get_search_config returned SEARCH_CONFIG.copy(), and update_search_config cached the decrypted Brave key into that shared global at startup (app_initializer), so the unauthenticated /api/search/config route exposed the operator's key. The cache was dead weight: brave_search reads its key via _get_provider_key (settings/env), never SEARCH_CONFIG. - update_search_config: no longer stores the api_key in the shared global (accepted for backward compat; provider keys are read on demand). - get_search_config: scrub any string-valued credential field before returning, preserving the has_api_key presence flag. No schema change; brave_search/_get_provider_key untouched. Adds regression tests. Fixes #1661 Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>	2026-06-03 13:24:17 +09:00
pewdiepie-archdaemon	8e2b9baf19	Rebuild memory vector index from the full saved set, not just the audited owner (#1747 ) audit_memories saves final_entries merged with other owners' entries (correct), but then rebuilt the shared vector collection from final_entries alone — wiping every other owner from semantic search until they happened to run their own audit. Keyword fallback masked it, so it degraded silently. Capture saved_entries once and rebuild from that. Caught by #1747.	2026-06-03 11:36:24 +09:00
red person	0ad5cd783b	Skip invalid research service sources (#1583 )	2026-06-03 08:57:09 +09:00
Afonso Coutinho	77313170c6	fix: search query helpers crash on a non-string query (#1604 )	2026-06-03 08:36:01 +09:00
Shaw	b54468291e	fix(hwfit): detect unified-memory NVIDIA (Grace Blackwell GB10 / DGX Spark) instead of 'No GPU' (#1340 ) (#1372 ) _detect_nvidia parsed nvidia-smi --query-gpu=memory.total,name and did float(memory.total) per row, dropping the row on ValueError. Grace Blackwell GB10 (DGX Spark, sm_121) reports memory.total as '[N/A]'/'Not Supported' because the GPU shares the system LPDDR pool rather than carrying discrete VRAM — so the only GPU row was dropped and a real GB10 (even with vLLM running on it) was reported as 'No GPU', breaking Cookbook recommendations and model switching. Keep a named device whose memory.total is non-numeric: when there are no discrete-VRAM rows but such unified devices exist, report a unified-memory CUDA GPU backed by the system RAM pool (has_gpu, name, backend=cuda, count, unified_memory=True) — mirroring how Apple Silicon and AMD APUs are already handled. Discrete GPUs are unchanged, and a box with a real discrete GPU keeps the discrete path. Adds tests/test_hwfit_unified_nvidia.py with a GB10 nvidia-smi fixture: the device is detected (not dropped), surfaces through detect_system with unified_memory propagated, discrete GPUs stay non-unified, and a discrete GPU takes precedence over an N/A-memory row. Co-authored-by: NubsCarson <nubs@nubs.site>	2026-06-03 03:19:39 +09:00
Vykos	5ee30cc144	Scope skills usage by owner (#1312 )	2026-06-03 02:27:43 +09:00
Afonso Coutinho	f62d6ea3d7	fix: research query misclassifies 'whatsapp'/'however' as questions (#1247 ) * fix: detect question words as whole words, not prefixes * fix: same question-word prefix bug in the services search copy * test: question-word detection rejects prefix lookalikes	2026-06-03 01:10:06 +09:00
red person	cc6e43da44	Report provider-specific search API keys correctly (#1202 ) * fix(search): report provider-specific API keys * fix(search): include provider env keys in status	2026-06-02 23:37:15 +09:00
pewdiepie-archdaemon	ff93a6c63b	Polish email and cookbook flows	2026-06-02 22:42:07 +09:00
Afonso Coutinho	2e2da2aefe	fix: extract_statistics drops large numbers and trailing % signs (#1153 ) * fix: extract_statistics misses comma-less numbers and drops trailing % * fix: same extract_statistics number/percent bug in services copy * test: extract_statistics captures full numbers and percent signs	2026-06-02 22:35:30 +09:00
Afonso Coutinho	2b2943a7b7	fix: extract_quotes accepts mismatched opening/closing quotes (#1113 ) * fix: only extract quotes whose closing quote matches the opening one * fix: same mismatched-quote bug in the services search copy * test: extract_quotes requires matching open/close quotes	2026-06-02 22:34:52 +09:00
Leo	de92bbe47a	Cookbook fit: steer consumer AMD to GGUF recommendations * Cookbook fit: consumer-AMD GGUF recommendations + accurate estimates (core logic) Split of #746 — the estimate/ranking MATH only, so it can be reviewed with tests first (UI changes follow separately). Backend files only: no static/js here. services/hwfit/fit.py, services/hwfit/hardware.py: - Recommend GGUF/llama.cpp on consumer AMD (RDNA, gfx10/11/12) instead of formats that don't run on consumer Radeon — vLLM-only AWQ/GPTQ/FP8 AND vendor-specific NVFP4 (NVIDIA) / MLX (Apple). Datacenter Instinct (CDNA) and CUDA are left untouched. - More accurate speed estimates across more GPUs (adds RDNA bandwidth data). - Detect AMD/RDNA GPUs (gpu_family from rocminfo) so fit/serve can branch on it. tests/test_hwfit_amd.py: AMD recommendation path, quant/bit matching, estimate realism, gfx RDNA-vs-CDNA classification. Rebased onto current main (analyze_model gained a scoring_use_case param there; kept it). Vision detection intentionally NOT added here — main already ships a "Vision" type filter + multimodal use-case handling; duplicating it was dropped. Checks: py_compile clean; pytest tests/test_hwfit_amd.py + hwfit/serve suites = 28 passed; full suite 0 new failures vs main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Tests: assert NVFP4/MLX/FP8 formats are filtered on consumer RDNA Backs the #972 claim with an explicit regression: no NVIDIA NVFP4, Apple MLX, or vLLM-only FP8/AWQ/GPTQ repos are recommended on a consumer Radeon, and guards against vacuity by asserting such repos exist in the catalog. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 21:01:42 +09:00
Tushar-Projects	c3228f8b59	Background tasks: respect active session model fallback	2026-06-02 20:57:42 +09:00
ghreprimand	aa0a9e8b5a	Search: align service content extraction Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 20:53:07 +09:00
ghreprimand	eddb9ce6db	Search: align service provider guards Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 20:52:13 +09:00
Tatlatat	51cf63009e	TTS: include mp3 files in cache stats TTSService._put_cache writes .mp3 for MP3 audio (ID3/MPEG-framed bytes) and .wav otherwise, and the rest of the class treats both as cache entries (_get_cache iterates (".mp3", ".wav"); eviction globs "."). But get_stats() enumerated the cache with `glob("*.wav")` only, so both cache_entries and cache_size_mb undercounted — reporting 0 whenever the cache held MP3 files, which is the common case for most TTS providers. Glob both extensions so the reported stats match what's actually cached. tests/test_tts_cache_stats.py writes an MP3-headed blob via _put_cache and asserts get_stats() reports one entry with non-zero size. Fails before this change.	2026-06-02 20:43:29 +09:00
Tatlatat	3885f9fa90	STT: clean temp audio files on transcription failure STTService._transcribe_local writes the audio to a NamedTemporaryFile (delete=False) and only unlinks it on the success path, before the except. If model.transcribe() raises (corrupt audio, model/runtime error, etc.) the function logs, returns None, and leaves the .webm temp file behind — so every failed local transcription leaks a file in the system temp dir. Initialize tmp_path = None up front and move the unlink into a finally block so the temp file is cleaned up whether transcription succeeds or raises. tests/test_stt_leak.py stubs the whisper model to raise during transcribe, runs _transcribe_local, and asserts it returns None and leaves no new .webm file in the temp dir. Fails before this change.	2026-06-02 20:43:24 +09:00
ghidras	6ea8fec896	Cookbook: fix Windows NVIDIA VRAM detection Co-authored-by: ghidras <ghidras@users.noreply.github.com>	2026-06-02 20:32:53 +09:00
spooky	cd4f496cb4	Fix native Cookbook quant classification	2026-06-02 13:07:20 +09:00
Leo	6fca7e86b7	Cookbook serve profiles and engine filter * Cookbook: Engine filter + intelligent hardware-computed serve profiles Two related Cookbook serving improvements for accurate, hardware-aware model serving (especially on consumer GPUs that can only run GGUF/llama.cpp). Engine filter - New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant picker. Pure client-side view filter over the fetched list via the same _detectBackend() the serve commands use, so what you filter to is exactly what would launch. Re-renders from cache (no refetch). Empty-state message + the instant-cache-paint path account for it too. Intelligent serve profiles (Quality / Balanced / Speed) - services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM + model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type, context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU instead of failing; a model that fits stays fully on GPU; quant tracks profile intent; vision models keep image-encoder headroom. Reuses models.py VRAM math so filtering and serving agree on what fits. Pure/deterministic (no t/s claims — partial-offload speed isn't reliably predictable; fit is what's computed). - /api/hwfit/profiles endpoint returns the profiles + the model's trained context limit, with loose name matching (strips org/ prefix, -GGUF suffix, quant tag) so a local GGUF folder name resolves to its catalog entry. - _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn / --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It previously only set -ngl/-c, which is why it OOM'd or ran slow. - Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV Cache / Flash Attn fields. Context is clamped to the model's trained limit (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch — fixes a crash where a stale 256k/16M preset + quantized KV cache caused an amdgpu ErrorDeviceLost. Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed VRAM, context cap, launchable flags, vision headroom, no-GPU empty. Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k, matching hand-tuning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook: make column-header sorting discoverable (incl. Newest) Sorting in Cookbook is via clickable column headers (pewds' design), but the headers had no visual cue that they're interactive — so sorting in general, and the Newest sort on the Model header specifically, was undiscoverable. - Style sortable headers as interactive: pointer cursor, hover underline, and the active sort column bolded/highlighted. There was no CSS for .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort, not just Newest. - The Model column header sorts by release_date (newest first), reusing the existing header-click sort wiring and the "newest" SORT_KEY. No new sort control — uses the existing column-header paradigm. Checks: node --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2) In the Serve tab the model is a specific GGUF file already on disk, so its quant can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K" as if you could re-quantize it. That's meaningless when serving a fixed file. - compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE mode), the quant is locked to the file's and profiles differ only in the real serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode (no override) still varies the quant to show download options. - /api/hwfit/profiles accepts serve_weights_gb & serve_quant. - The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from the repo/file name) and passes them, so profiles match what's actually served. Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k ncm15) — no nonsensical quant changes. Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor Two serve-panel additions: 1. Vision toggle. A "Vision" checkbox that serves the model with its multimodal projector so it can read images. The mmproj path is resolved at runtime (find mmproj-.gguf next to the model), so dropping an mmproj file in the model folder makes the toggle just work; `--mmproj … --image-max-tokens 1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found. 2. Live GPU-memory monitor.* A readout that polls /api/cookbook/gpus every 4s while the panel is open and shows VRAM used/total/%, free, and — crucially on a discrete card — RAM spillover (AMD gtt_used_mb), with a plain-language health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint (previously read for total only and discarded for 'used'). Lets you see at a glance whether a config fits VRAM (fast) or is paging to system RAM over PCIe (slow) instead of guessing. Checks: node --check + py_compile pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:34:42 +09:00
pewdiepie-archdaemon	966b53df77	Improve Cookbook serve diagnostics and recommendations	2026-06-02 12:15:47 +09:00
elijaheck	c303a29670	Fix native macOS tailnet launch and Metal GPU probe (#756 ) * macOS/Apple Silicon: detect Metal backend, surface MLX models, brew tmux hint - hardware.py: add _detect_macos() via sysctl/system_profiler; report backend=metal + unified_memory on Apple Silicon instead of cpu_arm - fit.py: add Apple Silicon (M1-M5) unified-memory bandwidths + metal FALLBACK_K so throughput estimates use the real bandwidth formula - setup.py: Mac-specific 'brew install tmux' hint Verified on M5 Pro 48GB: backend=metal, 273GB/s matched, 6 MLX models now visible (were hidden), cuda still hides MLX, no new test failures. * Fix native macOS tailnet launch and Metal GPU probe --------- Co-authored-by: Elijah (Hermes) <hermes@local>	2026-06-02 11:41:04 +09:00
David Anderson	610968f91e	fix: data integrity — deep-research result parsing + memory-extraction durability (#808 ) Two independent data-integrity bugs: - services/research/service.py: ResearchService.research() (the public deep-research API, re-exported from services/__init__) treated the handler return value as a dict (result.get("sources"/"summary"/...)), but call_research_service() returns a formatted markdown STRING -> AttributeError: str has no attribute get on EVERY successful call, making the API unusable for any non-error result. Now uses the string report as the summary and parses sources from the "### Sources" markdown section (section-bounded, URL-deduped), with a defensive dict branch for back-compat. - services/memory/memory_extractor.py: extract_and_store guarded the vector-store find_similar/add calls only with the .healthy flag set ONCE at init. If the embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint down), those calls raised, the exception escaped the dedup loop, skipped memory_manager.save(), and was swallowed by the outer try/except -> EVERY validated fact from the session was silently lost (the function docstring promises "never raised"). Now falls back to the existing text/fuzzy dedup so facts are still saved when the vector index is unavailable at runtime. Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.	2026-06-02 11:27:31 +09:00
Ernest Hysa	f4aef0dcf7	fix(skills): scope skill reads to caller owner (#777 ) read_skill_md and read_skill_reference walk all skill files via _iter_skill_files and return the first match by slug, regardless of owner. In a multi-user deployment where two users have skills with the same slug under different categories, a caller scoped to owner='alice' can read Bob's skill content. This is the same cross-tenant leak class as the update_skill / delete_skill fix (PR #755, merged), but on the read path. Changes: - read_skill_md / read_skill_reference accept owner= param (default None = match ownerless only, matching the write-path convention). - 7 callers updated: tool_implementations.py (view, view_ref, patch), builtin_actions.py (test_skills), skills_routes.py (audit, source, test routes). - Tests: read scoping (alice reads hers, not bob's), positive update scoping (alice can mutate her own), ownerless-match default.	2026-06-02 11:21:27 +09:00
Abeelha	290cd7f1cd	fix(stt): make local microphone transcription work without torch (#801 ) faster-whisper runs on CTranslate2, not torch, but _get_whisper() imported torch (only to check cuda availability) inside the same try as the faster-whisper import. on a torch-less machine that raised ImportError and reported the misleading 'faster-whisper not installed' even when it was installed, so local mic transcription silently failed. probe torch separately and optionally: present -> cuda, absent -> cpu. also declare faster-whisper in requirements-optional.txt (torch stays an optional extra for gpu).	2026-06-02 11:16:54 +09:00
mist	5ebe9ee67a	Fix invalidate_search_cache using a key that never matches stored entries (#852 ) invalidate_search_cache(query) built its cache key as generate_cache_key(f"{query}\|10\|None"), but the write path (searxng_search_results) replaces the caller's default count of 10 with the admin-configured _get_result_count() (default 5) before building the key. So a default search for "X" is cached under "X\|5\|None", while invalidation looked for "X\|10\|None" — they never match, and invalidate_search_cache silently failed to remove anything in the default configuration, violating its docstring ("invalidate ... just the given query"). Derive the count from _get_result_count() so invalidation matches the default-search entry the write path actually stores. The same bug (and fix) applies to both the src/search and services/search copies. Note: time-filtered variants (e.g. "X\|5\|day") still aren't reachable from a query-only signature, since cache keys are opaque SHA-256 hashes with no stored query; clearing those would need a broader cache-index redesign and is out of scope here. Adds tests/test_search_cache_invalidation.py covering the default-count case.	2026-06-02 10:53:33 +09:00
ghreprimand	d44f40b724	Honor disabled speech service toggles (#814 ) Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 10:44:39 +09:00
BSG-Walter	c0466274ed	fix: resolve DuckDuckGo redirect URLs in HTML fallback search The DuckDuckGo HTML fallback returns redirect URLs (//duckduckgo.com/l/?uddg=...) instead of actual page URLs. This caused fetch_webpage_content() to reject them instantly because _public_http_url() requires an http/https scheme, making search results unfetchable in deep research mode. Added _resolve_url() to: - Convert protocol-relative URLs to absolute (https:) - Convert path-relative URLs to absolute - Extract the real URL from DuckDuckGo's /l/?uddg= redirect parameters	2026-06-01 19:42:01 -03:00
Ernest Hysa	d42e6a7acc	Scope skill mutations to caller owner SkillsManager.update_skill walks every SKILL.md on disk and matches by slug only; the 'owner' key in its scalar_keys whitelist meant a caller could pass updates={'owner': 'attacker', 'description': 'pwned'} and the first matching file on disk got silently re-owned. Two users with the same slug under different category directories (which is supported by the on-disk layout <category>/<name>/SKILL.md) could each stomp the other's skill via the manage_skills tool or the in-process callers in tool_implementations.py (edit, patch, publish, delete). update_skill and delete_skill now require the caller's owner and only match a file whose parsed owner field matches. The default of None means 'no scope' and only matches ownerless skills, so an unsafe call without an explicit owner is now a no-op. 'owner' is also removed from scalar_keys so the updates dict cannot be used to reassign ownership even when the manager is called from an in-process path that didn't supply the owner argument. The in-process callers in tool_implementations.py are updated to pass owner=owner (which was already in scope at every call site) so the HTTP and agent paths both go through the scoped check. The HTTP route at routes/skills_routes.py:1499 was already owner-scoped via sm.load(owner=user); the fix brings the in-process path up to the same standard.	2026-06-02 05:59:43 +09:00
Afonso Coutinho	9b1acf6612	Fix year extraction in research queries * fix: extract full year in research query entities, not just the century * fix: same year capture-group bug in the services search copy * test: research query extracts the full year	2026-06-01 23:09:41 +09:00
spooky	033852ab14	fix: require GGUF sources for llama downloads (#368 )	2026-06-01 22:47:47 +09:00
Sirsyorrz	9955f5bc95	Fix VRAM estimates for pre-quantized HF repos The Cookbook fit scanner was reporting impossibly low VRAM requirements for some pre-quantized models — e.g. cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit shown as 7.1 GB ('perfect' on a 12 GB card) when the real load is ~40 GB. Root cause is in the catalog builder. When _entry_from_modelinfo falls back to safetensors metadata for the parameter count, it stored safetensors.total directly. For pre-quantized repos that figure reflects packed element counts: AWQ/GPTQ-Int4 pack 8x 4-bit weights into one I32, AWQ-8bit/GPTQ-Int8/FP8 pack 4x. The catalog therefore recorded ~1/8 of the real parameter count, and min_vram_gb = packed * bpp double-applied the quantization. Fix the safetensors fallback: * prefer the per-dtype parameters dict when available and unpack only the I32/I64 entries (the F16/BF16 scale/zero tensors and embeddings are already at their real element counts) * fall back to total * pack_factor when only total is exposed Patch the catalog entries that were affected by the old fallback so the fit ratings reflect reality without waiting for a full catalog rebuild: * cyankiwi/Qwen3-Coder-Next-REAM-AWQ-4bit 11.4B -> 79.7B (40.8 GB VRAM) * stelterlab/Qwen3-Coder-30B-A3B-Instruct-AWQ 4.6B -> 30.5B * stelterlab/NVIDIA-Nemotron-3-Nano-30B-A3B-AWQ 5.1B -> 30.5B * warshanks/Qwen3-8B-abliterated-AWQ 2.2B -> 8.2B * QuantTrio/sarvam-30b-AWQ 7B -> 30B * QuantTrio/sarvam-105b-AWQ 19B -> 105B Closes #377.	2026-06-01 18:32:58 +09:00
Fernando Lazzarin	14e8cffa41	Fail closed on untrusted teacher draft confidence Follow-up to #275. get_relevant_skills() treats a missing/unparseable confidence as 1.0, so it always clears the injection threshold. For teacher-escalation drafts -- auto-written from a possibly untrusted trace and then injected as authoritative guidance -- that means a draft can be auto-injected regardless of the configured confidence bar. Require teacher-escalation drafts to carry an explicit, parseable confidence that meets min_confidence; fail closed otherwise. Hand-authored legacy drafts keep the lenient "unset -> keep" behavior so they don't silently vanish, and published skills are unaffected. Ran: python -m py_compile services/memory/skills.py + a get_relevant_skills unit check (teacher drafts with None/garbage/0.8 excluded at min=0.85; 0.9 included; legacy + published unaffected; gate-off control unchanged). Co-authored-by: Fernando Lazzarin <263019791+waitdeadai@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 15:20:29 +09:00
pewdiepie-archdaemon	0888a3b3e6	Add native Windows compatibility layer	2026-06-01 15:09:47 +09:00
John Chaplin	f1817fd560	Add macOS Apple Silicon Cookbook support * Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning hardware.py detects Apple Silicon locally and over SSH, reporting backend=metal, the chip name, and a RAM-scaled fraction of unified memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8) that can't be served on Metal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 32ac81dbc680361463a088dae867d555d5a79c3b) * Generate macOS/Metal serve commands and surface the Metal GPU cookbook_routes.py adds a macOS serve path (Ollama, Metal-aware llama.cpp build using `sysctl hw.ncpu` instead of `nproc`, and a clear error if vLLM is attempted). The frontend defaults Metal serving to llama.cpp and offers llama.cpp/Ollama instead of vLLM/SGLang. The odysseus-cookbook CLI's `gpus` command reports the Metal GPU via sysctl/vm_stat. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 4ba01ce25d256ae032029898f361c824a34fcd4b) * Add launchd LaunchAgent for macOS (systemd equivalent) com.odysseus.ui.plist + install-service-macos.sh run Odysseus at login and restart on crash, the macOS counterpart to odysseus-ui.service. The installer auto-fills paths from the venv, so there's no hand-editing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 3d4b6b2c7b8b31af32201ed278115df9a559dea9) * Document macOS install (brew, Ollama, AirPlay port, launchd) README + setup.py cover the Homebrew / Apple Silicon path: brew install python@3.11 tmux ollama, Metal serving via Ollama/llama.cpp, the launchd service, and the macOS AirPlay Receiver conflict on ports 7000/5000. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 8dc9a3578a1726f070ed9f75c0958ae291a6d966) * Add downloadable macOS launcher app builder build-macos-app.sh generates dist/Odysseus.app and a drag-to-Applications dist/Odysseus.dmg. The app starts the local server from this repo's venv and opens the UI in a chrome-less app window (Chromium --app mode, falling back to the default browser). It's a launcher wrapper — it drives the venv rather than bundling Python — so the install path is baked in at build time. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 7927940c3810ee34640803b198d334a6ac93474d) * Harden macOS Cookbook support: hide MLX, fix Metal build cache Builds on the adopted PR #213 macOS/Metal work with two fixes and tests: - fit.py: always drop MLX-quantized models. Odysseus only generates serve commands for llama.cpp/Ollama (Metal) and vLLM/SGLang (CUDA); MLX needs the mlx_lm runtime and the catalog's MLX repos ship no GGUF alternative, so they were surfaced on Apple Silicon but could never be served. - cookbook_routes.py (macOS branch only): `rm -rf build` before configure so a poisoned CMakeCache from a prior failed CUDA attempt can't make every later build fail; explicit -DCMAKE_BUILD_TYPE=Release; a clear "brew install cmake" hint if cmake is missing. Linux/CUDA path unchanged. - tests/test_hwfit_macos.py: MLX hidden on metal, MLX still hidden on CUDA (regression guard), Metal detection on Apple Silicon, and skipped on Linux/Intel (proves non-macOS detection is untouched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Propagate unified_memory flag and document macOS GPU/Docker caveat - hardware.py: detect_system now carries the unified_memory flag from GPU detection into the system dict (it was set by _detect_apple_silicon / AMD-APU detection but dropped during result assembly, so the API always reported null). Lets callers distinguish unified from discrete VRAM. - README: prominent warning that Docker on Apple Silicon can't reach the Metal GPU (runs a Linux VM) — Cookbook must run natively for GPU serving; fix stale text that said Cookbook recommends MLX models (now hidden as unservable). - test: detect_system propagates unified_memory. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Put Odysseus's venv bin on PATH for cookbook runners Native (non-Docker) installs run from a virtualenv whose bin holds the `hf` CLI and `python3` the cookbook download/serve tmux scripts shell out to. Those scripts start in a fresh login shell with the venv NOT activated, so on a native macOS install `hf download` failed with "hf: command not found" — and the `pip --user` self-heal missed because macOS has no bare `pip` command. - cookbook_helpers.py: _local_tooling_path_export() — pure helper returning a PATH export for the running interpreter's bin dir (escaped for double quotes). - cookbook_routes.py: download + serve runners prepend that dir on local runs (gated off SSH/Windows); swap the `pip` install fallbacks to `python3 -m pip`. - tests: helper output for normal and spaced paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Document macOS llama.cpp serving prerequisites Clarify the two serving paths on Apple Silicon: the recommended zero-build route (brew install llama.cpp ships a Metal llama-server Cookbook finds on PATH), and the from-source fallback, which requires cmake + Xcode Command Line Tools. Without those the build is skipped and serving silently degrades to a slow CPU build, so new users now know to install them (or use the prebuilt) up front. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Recommend only GGUF-servable models on Metal Apple Silicon's only serving engines are llama.cpp and Ollama, both GGUF-only (vLLM/SGLang are CUDA/ROCm and don't run on macOS). The catalog tags raw safetensors repos with a default Q4_K_M quant, so the fit-ranking was recommending ~397/501 models that have no GGUF and fail to serve on Metal with "No GGUF found" (e.g. microsoft/Phi-mini-MoE-instruct). Drop any model without a real GGUF (is_gguf/gguf_sources) on Apple Silicon — subsumes the previous AWQ/GPTQ/FP8 special-case into one rule. On CUDA these stay visible since vLLM serves safetensors directly. Metal recommendations go 501 -> 104, all actually servable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Remove macOS launchd LaunchAgent (cherry-picked extra) Drop the launchd service from the PR #213 cherry-picks: the install-service-macos.sh installer, the com.odysseus.ui.plist template, and the README section documenting them. Tangential to the core Cookbook/Metal support and not wanted. The build-macos-app.sh launcher is kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Add one-command macOS quick start (start-macos.sh) Running Odysseus natively on a Mac previously meant ~7 manual terminal steps (brew deps, venv, activate, pip, setup.py, uvicorn with the right port) — not friendly for a generic macOS user, and the native run is required because Docker on macOS can't reach the Metal GPU. - start-macos.sh: installs Homebrew deps (python@3.11, tmux, prebuilt Metal llama.cpp), creates the venv, installs requirements, runs setup, and launches on a non-AirPlay port (7860). Idempotent; re-run to start again. - README: the Apple Silicon section now leads with this one-command quick start and the clickable .app, with engine/port/manual details folded into a collapsible block. Added a pointer at the top of the manual-install section. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * macOS quick start: auto-open browser when ready The "open this URL" line scrolled out of view as uvicorn kept logging after it, so users missed it. Now start-macos.sh waits (in the background) until the server accepts connections, prints a boxed "ready" banner at that point (i.e. after the startup burst, not before), and opens the URL in the default browser automatically. Skippable with ODYSSEUS_NO_OPEN=1 for headless/SSH use. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Don't assume/force a specific Python version on macOS The README claimed "system Python is 3.9" — a machine-specific generalization that's often wrong (macOS ships no recent Python by default; many users already have 3.11+). Make it generic, and make start-macos.sh detect an existing Python 3.11+ and use it, only installing python@3.11 when none is found instead of forcing it on top of the user's Python. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Align start-macos.sh venv path with build-macos-app.sh start-macos.sh created the environment in .venv/, but build-macos-app.sh and the manual install steps use venv/ — so the clickable .app wouldn't reuse the quick-start's environment and would rebuild a second one. Use venv/ everywhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * README: state clearly that MLX is unsupported on Apple Silicon Odysseus has no mlx_lm runtime; it serves GGUF (llama.cpp/Ollama) and CUDA (vLLM/SGLang) only. MLX-only models can't run on a Mac and are hidden from Cookbook — make that explicit in both the quick start and the details. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * start-macos.sh: build the venv with an arm64 Python on Apple Silicon A clean-room run surfaced this: with a universal2/x86 Python (e.g. the python.org installer under /usr/local), the venv's compiled extensions install as arm64 but get loaded as x86_64 when launched from the .app bundle, so it crashes with "incompatible architecture (have arm64, need x86_64)". The terminal run happened to work only because a universal binary defaults to arm64 there. On Apple Silicon, look only under /opt/homebrew (arm64-only) for the build Python, and install Homebrew's python@3.11 if none is present — so the venv is arm64-only and launches correctly from both the terminal and the .app. Intel and non-mac paths are unchanged. Verified end-to-end in a clean clone: .app now boots on Metal with no arch error. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Address dev-exp review: macOS setup robustness + doc/UX fixes From the voltagent dev-exp review of the branch: - README: fix broken anchor links (the em-dash heading produced a slug the links didn't match); simplify the heading to a stable slug. - cookbook_routes.py: add /opt/homebrew/bin and /usr/local/bin to the serve PATH so a brew-installed llama-server/ollama is found instead of falling back to a slow source build. - start-macos.sh: guard against an empty Python path; fail fast with a clear message on port-in-use; ERR trap with a "safe to re-run" message; show pip progress (drop --quiet on the slow requirements install); stop the background browser-opener cleanly on exit/Ctrl+C (no orphaned poller). - setup.py: bind hint to 127.0.0.1; suppress the manual run-hint when launched by start-macos.sh (ODYSSEUS_SKIP_RUN_HINT) so the URL isn't contradictory. - build-macos-app.sh: the .app only opens the browser once the server is actually ready (not after the readiness timeout). - cookbookServe.js: drop "Diffusers" from the Metal backend picker — diffusion_server.py is CUDA-only, so it was an unservable option on macOS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: yunggilja <yunggilja@gmail.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 14:59:19 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00

40 Commits