Leo 6fca7e86b7 Cookbook serve profiles and engine filter
* Cookbook: Engine filter + intelligent hardware-computed serve profiles

Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).

Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
  picker. Pure client-side view filter over the fetched list via the same
  _detectBackend() the serve commands use, so what you filter to is exactly what
  would launch. Re-renders from cache (no refetch). Empty-state message + the
  instant-cache-paint path account for it too.

Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
  model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
  context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
  instead of failing; a model that fits stays fully on GPU; quant tracks profile
  intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
  so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
  — partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
  context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
  quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
  --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
  previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
  Cache / Flash Attn fields. Context is clamped to the model's trained limit
  (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
  fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
  amdgpu ErrorDeviceLost.

Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook: make column-header sorting discoverable (incl. Newest)

Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.

- Style sortable headers as interactive: pointer cursor, hover underline, and
  the active sort column bolded/highlighted. There was no CSS for
  .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
  not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
  existing header-click sort wiring and the "newest" SORT_KEY.

No new sort control — uses the existing column-header paradigm.

Checks: node --check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)

In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.

- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
  mode), the quant is locked to the file's and profiles differ only in the real
  serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
  use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
  (no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
  the repo/file name) and passes them, so profiles match what's actually served.

Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.

Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor

Two serve-panel additions:

1. **Vision toggle.** A "Vision" checkbox that serves the model with its
   multimodal projector so it can read images. The mmproj path is resolved at
   runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
   the model folder makes the toggle just work; `--mmproj … --image-max-tokens
   1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.

2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
   while the panel is open and shows VRAM used/total/%, free, and — crucially on
   a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
   health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
   CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
   (previously read for total only and discarded for 'used').

Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.

Checks: node --check + py_compile pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:34:42 +09:00
2026-05-31 23:58:26 +09:00
2026-06-01 10:57:31 +09:00
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00
2026-05-31 23:58:26 +09:00
2026-06-01 22:38:56 +09:00

Odysseus

─────────────────────────────────────────────── ⊹ ࣪ ˖ ૮( ˶ᵔ ᵕ ᵔ˶ )っ Odysseus vers. 1.0 ───────────────────────────────────────────────

Odysseus

A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan.

Features

  • Chat -- chat with any local model or API; adding them is super simple.
     vLLM · llama.cpp · Ollama · OpenRouter · OpenAI
  • Agent -- hand it tools and let it run the whole task itself.
     built on opencode · MCP · web · files · shell · skills · memory
  • Cookbook -- Scans your hardware, recommends models, click to download and serve.. easy!
     built on llmfit · VRAM-aware · GGUF / FP8 / AWQ · fit scoring · vLLM / llama.cpp serving
  • Deep Research -- multi-step runs that gather, read, and synthesize sources into a nice visual report.
     adapted from Tongyi DeepResearch
  • Compare -- a fun tool to compare models side by side. Test completely blind, no bias!
     multi-model · blind test · synthesis
  • Documents -- YOU write the text, AI is there to assist, not the opposite.
     multi-tab editor · markdown · HTML · CSV · syntax highlighting · AI edits · suggestions
  • Memory / Skills -- Persistent memory and skills, your agent evolves over time as it better understands you and your tasks!
     ChromaDB · fastembed (ONNX) · vector + keyword retrieval · import/export
  • Email -- IMAP/SMTP inbox with AI triage built in: urgency reminders, auto-tag, auto-summary, auto-reply drafts, auto-spam.
     IMAP · SMTP · per-account routing · CalDAV-aware
  • Notes & Tasks -- Quick notes with reminders, a todo list, and scheduled tasks the agent can act on.
     note pings · checklist · cron-style tasks · ntfy / browser / email channels
  • Calendar -- Local-first calendar with CalDAV sync to Radicale / Nextcloud / Apple / Fastmail.
     CalDAV pull · .ics import/export · per-calendar colors · agent-aware
  • Works on mobile -- looks and runs great on your phone, not just desktop.
     responsive · installable (PWA) · touch gestures
  • Extras -- more to explore, happy if you give it a go!
     image editor · theme editor · file uploads (vision + PDF) · web search · presets · sessions · 2FA

Demo

A full, hover-to-play tour lives on the landing page (docs/index.html).

Screenshots / clips

Chat & Agents

Chat & Agents

Deep Research

Deep Research

Compare

Compare

Documents

Documents

Notes & Tasks

Notes & Tasks

Quick Start

Defaults work out of the box: clone, run, then configure models/search/email inside Settings. Only edit .env for deployment-level overrides like APP_BIND, APP_PORT, AUTH_ENABLED, DATABASE_URL, or a pre-seeded admin password.

On first setup, Odysseus creates an admin account (admin unless ODYSSEUS_ADMIN_USER is set) and prints a temporary password in the terminal. For Docker installs, the same line is in docker compose logs odysseus. Use that for the first login, then change it in Settings.

Contributing? See CONTRIBUTING.md for setup, testing, and pull request guidelines.

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
cp .env.example .env       # optional, but recommended for explicit defaults
docker compose up -d --build

Open http://localhost:7000 when the containers are healthy. Docker Compose binds the web UI to 127.0.0.1 by default. If the port is taken, set APP_PORT=7001 in .env and recreate the container. Set APP_BIND=0.0.0.0 only when you intentionally want LAN/reverse-proxy access.

Native Linux / macOS

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000

Requirements: Python 3.11+. Cookbook also needs tmux for background model downloads and serves. Use --host 0.0.0.0 only when you intentionally want LAN/reverse-proxy access.

Apple Silicon

Docker on macOS cannot use the Metal GPU. For GPU-accelerated Cookbook on an M-series Mac, run Odysseus natively:

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh

It launches at http://127.0.0.1:7860. To expose it to your phone over a trusted LAN/VPN such as Tailscale, bind all interfaces:

ODYSSEUS_HOST=0.0.0.0 ./start-macos.sh
# then open http://<tailscale-ip>:7860

Keep auth enabled when binding outside loopback, and do not expose this port directly to the public internet. To build a clickable app wrapper:

./build-macos-app.sh
Cookbook, GPU, Ollama, and troubleshooting notes

Docker bundled services. Compose starts Odysseus, ChromaDB, SearXNG, and ntfy. Odysseus and the bundled service ports bind to 127.0.0.1 by default, so they are reachable from the host but not exposed to your LAN/public internet unless you opt in.

Cookbook storage in Docker. Downloads live in ./data/huggingface (~/.cache/huggingface in the container). Cookbook-installed Python CLIs and serve engines live in ./data/local (~/.local in the container), so they survive container recreation.

Remote servers. In Cookbook -> Settings -> Servers, generate the Odysseus SSH key and add the public key to the remote server's ~/.ssh/authorized_keys. From the host you can also run:

ssh-copy-id -i data/ssh/id_ed25519.pub user@server

NVIDIA Docker GPU overlay. CPU-only users can skip this section. scripts/check-docker-gpu.sh diagnoses GPU passthrough and can optionally install the host runtime or update .env. Cookbook can only detect GPUs that Docker exposes to the container — if the host runtime is not configured, Cookbook sees the iGPU, another card, or CPU instead of your NVIDIA GPU.

# Read-only diagnostic (default — installs nothing, never edits .env):
scripts/check-docker-gpu.sh

# Print OS-specific install commands without running them:
scripts/check-docker-gpu.sh --print-install-commands

# Install NVIDIA Container Toolkit on Ubuntu/Debian (requires sudo):
scripts/check-docker-gpu.sh --install-nvidia-toolkit

# Write COMPOSE_FILE to .env (only when GPU passthrough is confirmed working):
scripts/check-docker-gpu.sh --enable-nvidia-overlay

# Full assisted setup — install toolkit, then enable overlay if passthrough works:
scripts/check-docker-gpu.sh --install-nvidia-toolkit --enable-nvidia-overlay

Safety notes:

  • The app never installs host GPU runtime automatically.
  • The app never edits .env automatically.
  • .env is only modified when --enable-nvidia-overlay is explicitly passed, and only after GPU passthrough succeeds. --yes skips prompts but does not bypass the passthrough gate.
  • .env.bak.* backups created by --enable-nvidia-overlay are ignored by Git and the Docker build context.

To enable manually without the script, add this to .env:

COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml

AMD / ROCm. AMD GPU passthrough is not automated. Add manually:

COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml

Verify after enabling either overlay:

docker compose exec odysseus nvidia-smi -L   # NVIDIA
docker compose exec odysseus rocm-smi        # AMD

GPU passthrough ≠ llama.cpp CUDA. nvidia-smi passing inside the container confirms Docker GPU access, but llama.cpp also needs cudart and the CUDA Toolkit at runtime. If Cookbook logs show Unable to find cudart library, Could NOT find CUDAToolkit, CUDA Toolkit not found, or tensors/layers assigned to CPU, that is a Cookbook/llama.cpp build issue — not a Docker passthrough failure. Re-install the serve engine via Cookbook → Dependencies to get a CUDA-enabled build.

Ollama with Docker. If Ollama runs on the host, add this endpoint in Settings:

http://host.docker.internal:11434/v1

Ollama must listen outside its own loopback interface:

OLLAMA_HOST=0.0.0.0:11434 ollama serve

Useful checks.

docker compose ps
docker compose logs --tail=120 odysseus
docker compose logs odysseus | grep -E 'ChromaDB|MemoryVectorStore|DEGRADED'

macOS details. start-macos.sh installs Homebrew deps, creates the venv, runs setup, and starts uvicorn on port 7860 because AirPlay often holds 7000. It uses llama.cpp/Ollama for Metal. vLLM/SGLang are CUDA/ROCm-only and do not run on macOS. MLX-only models are not served by Odysseus.

Native Windows

One-command launcher (creates the venv, installs deps, runs setup, starts the server; safe to re-run):

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1

Or do it by hand:

git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000

Requirements: Python 3.11+. The core app (chat, agent, memory, documents, email, calendar, deep research) runs fully native. For full Cookbook background model downloads and the agent shell tool, also install Git for Windows (provides bash.exe). Local GPU serving of vLLM/SGLang needs Linux/WSL2; for a local model on Windows, Ollama is the easiest path — point Odysseus at http://localhost:11434/v1 in Settings.

Open http://localhost:7000, log in with the generated admin password, and configure everything else inside Settings.

Security Notes

Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.

  • Keep AUTH_ENABLED=true for any network-accessible deployment.
  • Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy.
  • Keep data/, .env, logs, databases, and uploaded/generated media out of Git. They are ignored by default.
  • Review data/auth.json after first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin.
  • Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
  • Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
  • If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
  • Prefer binding manual development runs to 127.0.0.1; bind to 0.0.0.0 only when you intentionally want LAN/reverse-proxy access.
  • Before publishing a fork, run git status --short and confirm no private files from .env, data/, logs/, uploads, backups, or local databases are staged.

Putting it behind HTTPS

Odysseus serves plain HTTP on its port. That's fine for localhost and trusted LAN/VPN use, but browsers will warn ("Password fields present on an insecure page") and the login + API tokens travel in cleartext. For anything reachable outside your machine — including a Tailscale IP shared with other devices — put a TLS-terminating reverse proxy in front.

Shortest path with Caddy (auto-renews Let's Encrypt certs):

odysseus.example.com {
  reverse_proxy localhost:7000
}

For a LAN-only Tailscale deployment, Caddy + tailscale-cert or the built-in MagicDNS HTTPS feature both work. nginx/Traefik configs are similar — proxy localhost:7000, terminate TLS at the proxy. Once that's in place, the browser warning goes away and your login is encrypted.

Contributing

Help is welcome. The best entry points are fresh-install testing, provider setup bugs, mobile/editor polish, docs, and small focused refactors. See ROADMAP.md for the current help-wanted list.

Configuration

Most setup is done inside the app with /setup or Settings. Use .env for deployment-level defaults and secrets you want present before first boot. Key settings:

Variable Default Description
LLM_HOST localhost Your LLM server (e.g. llm-host.local:8000)
LLM_HOSTS -- Comma-separated list for model discovery
OPENAI_API_KEY -- Optional OpenAI key. Prefer adding providers in the app unless pre-seeding.
SEARXNG_INSTANCE http://localhost:8080 SearXNG URL. Docker overrides this to http://searxng:8080.
SEARXNG_SECRET generated on first Docker boot Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it.
APP_BIND 127.0.0.1 Docker Compose host bind address for the web UI. Use 0.0.0.0 only for intentional LAN/reverse-proxy access.
APP_PORT 7000 Docker Compose host port for the web UI.
AUTH_ENABLED true Enable/disable login
LOCALHOST_BYPASS false Development-only auth bypass for loopback requests. Keep false for shared/network deployments.
DATABASE_URL sqlite:///./data/app.db Database connection string
CHROMADB_HOST localhost ChromaDB host for vector memory. Docker overrides this to chromadb.
CHROMADB_PORT 8100 ChromaDB port for manual host runs. Docker overrides this to 8000.
EMBEDDING_URL -- OpenAI-compatible embeddings endpoint

Built-in MCP servers (optional setup)

Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, @playwright/mcp) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.

To enable the browser MCP (page navigation, screenshots, vision), run once:

npx -y @playwright/mcp@latest --version

That installs @playwright/mcp plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.

Architecture

app.py                   # FastAPI entry point
core/      auth, database, middleware, constants
src/       llm_core, agent_loop, agent_tools, chat_processor, search/
routes/    chat, session, document, memory, model … endpoints
services/  docs, memory, search, hwfit (Cookbook) …
static/    index.html + app.js + style.css + js/ (modular front-end)
docs/      landing page (index.html) + preview clips

Data

All user data lives in data/ (gitignored): app.db (sessions, messages, documents), memory.json, presets.json, uploads/, personal_docs/, chroma/, settings.json.

Star History

Star History Chart

License

MIT -- see LICENSE and ACKNOWLEDGMENTS.md.

                                  |
                                 |||
                                |||||
                  |    |    |   |||||||
                 )_)  )_)  )_)   ~|~
                )___))___))___)\  |
               )____)____)_____)\\|
             _____|____|____|_____\\\__
             \                       /
       ~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
               ~^~  all aboard!  ~^~
       ~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
Description
No description provided
Readme MIT 28 MiB
Languages
Python 44.7%
JavaScript 43.6%
CSS 9.3%
HTML 1.8%
Shell 0.5%