odysseus

Files

Leo 6fca7e86b7 Cookbook serve profiles and engine filter

* Cookbook: Engine filter + intelligent hardware-computed serve profiles

Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).

Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
  picker. Pure client-side view filter over the fetched list via the same
  _detectBackend() the serve commands use, so what you filter to is exactly what
  would launch. Re-renders from cache (no refetch). Empty-state message + the
  instant-cache-paint path account for it too.

Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
  model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
  context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
  instead of failing; a model that fits stays fully on GPU; quant tracks profile
  intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
  so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
  — partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
  context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
  quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
  --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
  previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
  Cache / Flash Attn fields. Context is clamped to the model's trained limit
  (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
  fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
  amdgpu ErrorDeviceLost.

Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook: make column-header sorting discoverable (incl. Newest)

Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.

- Style sortable headers as interactive: pointer cursor, hover underline, and
  the active sort column bolded/highlighted. There was no CSS for
  .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
  not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
  existing header-click sort wiring and the "newest" SORT_KEY.

No new sort control — uses the existing column-header paradigm.

Checks: node --check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)

In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.

- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
  mode), the quant is locked to the file's and profiles differ only in the real
  serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
  use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
  (no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
  the repo/file name) and passes them, so profiles match what's actually served.

Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.

Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor

Two serve-panel additions:

1. **Vision toggle.** A "Vision" checkbox that serves the model with its
   multimodal projector so it can read images. The mmproj path is resolved at
   runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
   the model folder makes the toggle just work; `--mmproj … --image-max-tokens
   1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.

2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
   while the panel is open and shows VRAM used/total/%, free, and — crucially on
   a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
   health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
   CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
   (previously read for total only and discarded for 'used').

Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.

Checks: node --check + py_compile pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-02 12:34:42 +09:00

bombadil-spec.ts

Odysseus v1.0

2026-05-31 23:58:26 +09:00

conftest.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

test_action_intents.py

Route calendar action requests to tools

2026-06-01 14:32:41 +09:00

test_agent_loop.py

Fix native tool-calling follow-up round on Gemini and Ollama (#867 )

2026-06-02 11:39:40 +09:00

test_app_static_mime.py

fix: normalize JS static MIME types on Windows

2026-06-02 01:32:00 +02:00

test_app.py

Fix fresh checkout test failures

2026-06-01 02:22:17 +00:00

test_auth_event_loop.py

Stabilize security regression tests

2026-06-02 05:48:59 +09:00

test_auth_regressions.py

fix(research): gate /api/research/spinoff on session ownership (#878 )

2026-06-02 12:26:12 +09:00

test_auth_session_revocation.py

Stabilize auth session revocation tests

2026-06-02 06:02:49 +09:00

test_backup_cli_security.py

Harden backup restore tar extraction

2026-06-02 05:55:03 +09:00

test_calendar_owner_scope.py

Stabilize security regression tests

2026-06-02 05:48:59 +09:00

test_calendar_recurrence.py

Fix YEARLY recurring CalDAV events only showing on DTSTART year (#179 )

2026-06-01 13:42:44 +09:00

test_chat_stream_scope.py

Fix chat stream recovery and PDF library indexing (#468 )

2026-06-01 22:33:35 +09:00

test_chroma_client.py

fix: ChromaDB unreachable blocks app startup for 30-60s (#326 ) (#476 )

2026-06-01 22:22:41 +09:00

test_companion_readonly.py

Add read-only companion endpoints (ping/info/owner-scoped models) (#863 )

2026-06-02 11:20:53 +09:00

test_compare_js.py

Fix duplicate compare modal on repeated clicks (#491 )

2026-06-01 22:24:27 +09:00

test_context_compactor.py

Preserve large pasted messages in context

2026-06-01 12:38:35 +09:00

test_cookbook_helpers.py

feat: select cached gguf artifacts for serve (#891 )

2026-06-02 12:32:40 +09:00

test_ddg_redirect_resolution.py

Match host, not substring, when resolving DuckDuckGo redirects (#886 )

2026-06-02 12:25:56 +09:00

test_deep_research_extraction_controls.py

Allow longer deep research extraction timeouts (#651 )

2026-06-02 11:50:03 +09:00

test_document_deeplink.py

fix: open #document deep-links on refresh and surface load errors (#631 )

2026-06-02 11:48:54 +09:00

test_document_tool_owner_scope.py

Scope document tools to caller owner

2026-06-02 06:00:02 +09:00

test_endpoint_resolver.py

Never resolve to a disabled endpoint model (#861 )

2026-06-02 11:10:43 +09:00

test_esc_menu_stack_js.py

fix: make transient dropdown/popup menus close on Escape

2026-06-01 14:23:22 -04:00

test_gallery_image_privileges.py

Gate image editor AI endpoints by privilege (#447 )

2026-06-01 22:35:24 +09:00

test_history_topics_owner_scope.py

fix(history): scope topic analysis to authenticated owner only (#744 )

2026-06-02 11:36:01 +09:00

test_hwfit_macos.py

fix: require GGUF sources for llama downloads (#368 )

2026-06-01 22:47:47 +09:00

test_keybind_altgr_js.py

Ignore AltGr keystrokes in Ctrl+Alt keyboard shortcuts (#825 )

2026-06-02 11:12:54 +09:00

test_llm_core_anthropic_cache.py

Add Anthropic prompt caching to the agent loop (#812 )

2026-06-02 11:14:31 +09:00

test_llm_core_concurrency.py

Make LLM host health maps thread-safe

2026-06-02 05:54:23 +09:00

test_llm_core_fallback.py

Surface silent model fallback instead of masking it (#868 )

2026-06-02 11:37:25 +09:00

test_llm_core_ollama.py

Fix native tool-calling follow-up round on Gemini and Ollama (#867 )

2026-06-02 11:39:40 +09:00

test_llm_core_reasoning.py

Support vLLM 0.20.2 / NIM reasoning-parser output end-to-end (surface + agent context + render) (#602 )

2026-06-02 11:48:17 +09:00

test_llm_core_sanitize_tool_calls.py

Keep no-prose assistant tool-call messages through _sanitize_llm_messages (#862 )

2026-06-02 11:17:22 +09:00

test_llm_core_streaming.py

Fix native tool-calling follow-up round on Gemini and Ollama (#867 )

2026-06-02 11:39:40 +09:00

test_local_endpoint_js.py

fix: don't bill self-hosted models reached by a container/service hostname (#596 )

2026-06-02 11:47:58 +09:00

test_markdown_rendering_js.py

Fix ordered list rendering in markdown preview (#645 )

2026-06-02 11:49:44 +09:00

test_markitdown_runtime.py

Add optional markitdown extraction for Office/EPUB documents (#766 )

2026-06-02 11:28:52 +09:00

test_mcp_manager.py

fix: add Browser MCP connection diagnostics (#662 )

2026-06-02 11:50:17 +09:00

test_memory_bullet_extraction.py

Fix AttributeError on bullet lines in extract_memory_from_chat (#873 )

2026-06-02 11:46:06 +09:00

test_memory_extractor_vector_degraded.py

fix: data integrity — deep-research result parsing + memory-extraction durability (#808 )

2026-06-02 11:27:31 +09:00

test_model_context.py

Refresh local model context after restart

2026-06-02 05:54:06 +09:00

test_model_routes.py

Improve Ollama endpoint error messages

2026-06-02 05:53:50 +09:00

test_null_owner_gates.py

fix(security): fail closed on null-owner session in sync-chat endpoint (#870 )

2026-06-02 11:38:05 +09:00

test_og_image_extraction.py

fix: source thumbnails dropped for http-only og:image URLs (#667 )

2026-06-02 11:41:33 +09:00

test_ollama_port_detection.py

Add Ollama port path detection regressions (#883 )

2026-06-02 12:24:18 +09:00

test_pdf_runtime.py

Show a clear message when PyMuPDF is missing

2026-06-01 18:27:17 +09:00

test_personal_docs_office_index.py

Add optional markitdown extraction for Office/EPUB documents (#766 )

2026-06-02 11:28:52 +09:00

test_personal_docs_pdf_index.py

Fix chat stream recovery and PDF library indexing (#468 )

2026-06-01 22:33:35 +09:00

test_personal_upload_isolation.py

Scope personal RAG uploads by owner (#446 )

2026-06-01 22:36:53 +09:00

test_provider_detection.py

Provider detection: match by hostname instead of substring (re #768 ) (#815 )

2026-06-02 11:11:17 +09:00

test_rate_limiter.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

test_reply_recipients_js.py

fix: reply-all Cc's the user's own other addresses (multi-account) (#672 )

2026-06-02 11:42:20 +09:00

test_research_query_fallback.py

Deep research: don't treat a bare 'yes' as the research topic (#858 )

2026-06-02 11:30:53 +09:00

test_research_service.py

fix: data integrity — deep-research result parsing + memory-extraction durability (#808 )

2026-06-02 11:27:31 +09:00

test_research_session_id_validation.py

fix(research): validate session_id to block path traversal

2026-06-01 23:25:38 +01:00

test_research_utils.py

fix: deep research discards valid sources mentioning cookies/copyright (#481 )

2026-06-01 22:26:37 +09:00

test_reserved_username_admin_escalation.py

Reserve internal sentinel usernames

2026-06-02 05:58:58 +09:00

test_resolve_endpoint_fallbacks.py

Add resolve_endpoint fallback chain regressions (#890 )

2026-06-02 12:24:50 +09:00

test_review_regressions.py

Restrict provider discovery to admins

2026-06-02 05:54:40 +09:00

test_scheduler_restart_doublefire.py

fix(scheduler): push next_run forward on startup to stop restart double-fire (#708 )

2026-06-02 11:43:30 +09:00

test_search_cache_invalidation.py

Fix invalidate_search_cache using a key that never matches stored entries (#852 )

2026-06-02 10:53:33 +09:00

test_search_query.py

Fix year extraction in research queries

2026-06-01 23:09:41 +09:00

test_search_ranking.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

test_security_regressions.py

fix(cookbook): default Ollama serve to loopback (#872 )

2026-06-02 12:27:04 +09:00

test_serve_profiles.py

Cookbook serve profiles and engine filter

2026-06-02 12:34:42 +09:00

test_session_mode_helpers.py

Fix database stubs in regression tests (#301 )

2026-06-01 16:55:09 +09:00

test_session_owner_attribution.py

Attribute API-token sessions to the token owner (effective_user) (#871 )

2026-06-02 11:39:01 +09:00

test_settings_scrub.py

Deep-scrub secrets from public settings

2026-06-01 23:11:50 +09:00

test_setup_admin_user.py

Normalize setup admin username (#448 )

2026-06-01 22:38:56 +09:00

test_shell_routes.py

Expose Cookbook user-install CLIs in Docker (#887 )

2026-06-02 12:23:29 +09:00

test_skill_index_prompt_injection.py

fix(agent-loop): wrap matched skills + skill index in untrusted user-role message (#788 )

2026-06-02 11:15:45 +09:00

test_skills_manager_owner_isolation.py

fix(skills): scope skill reads to caller owner (#777 )

2026-06-02 11:21:27 +09:00

test_speech_service_toggles.py

Honor disabled speech service toggles (#814 )

2026-06-02 10:44:39 +09:00

test_task_scheduler_session_delivery.py

Fix test suite: ESM module loading and stub isolation (#844 )

2026-06-02 11:29:29 +09:00

test_topic_analyzer.py

fix: topic analysis false-matches keywords as substrings (e.g. 'ai' in 'email') (#687 )

2026-06-02 11:42:04 +09:00

test_vault_password_not_in_argv.py

fix(security): stop leaking the vault master password via process argv (#879 )

2026-06-02 12:25:43 +09:00

test_vision_model_detection.py

Recognize local vision models so their images aren't dropped (#185 )

2026-06-01 13:09:21 +09:00

test_visual_report.py

Fix visual report chapter navigation (#505 )

2026-06-01 22:26:13 +09:00

test_webhook_trigger_auth_exempt.py

Exempt task webhook trigger from session auth (#784 )

2026-06-02 11:23:40 +09:00