* Cookbook: Engine filter + intelligent hardware-computed serve profiles Two related Cookbook serving improvements for accurate, hardware-aware model serving (especially on consumer GPUs that can only run GGUF/llama.cpp). Engine filter - New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant picker. Pure client-side view filter over the fetched list via the same _detectBackend() the serve commands use, so what you filter to is exactly what would launch. Re-renders from cache (no refetch). Empty-state message + the instant-cache-paint path account for it too. Intelligent serve profiles (Quality / Balanced / Speed) - services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM + model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type, context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU instead of failing; a model that fits stays fully on GPU; quant tracks profile intent; vision models keep image-encoder headroom. Reuses models.py VRAM math so filtering and serving agree on what fits. Pure/deterministic (no t/s claims — partial-offload speed isn't reliably predictable; fit is what's computed). - /api/hwfit/profiles endpoint returns the profiles + the model's trained context limit, with loose name matching (strips org/ prefix, -GGUF suffix, quant tag) so a local GGUF folder name resolves to its catalog entry. - _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn / --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It previously only set -ngl/-c, which is why it OOM'd or ran slow. - Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV Cache / Flash Attn fields. Context is clamped to the model's trained limit (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch — fixes a crash where a stale 256k/16M preset + quantized KV cache caused an amdgpu ErrorDeviceLost. Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed VRAM, context cap, launchable flags, vision headroom, no-GPU empty. Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k, matching hand-tuning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook: make column-header sorting discoverable (incl. Newest) Sorting in Cookbook is via clickable column headers (pewds' design), but the headers had no visual cue that they're interactive — so sorting in general, and the Newest sort on the Model header specifically, was undiscoverable. - Style sortable headers as interactive: pointer cursor, hover underline, and the active sort column bolded/highlighted. There was no CSS for .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort, not just Newest. - The Model column header sorts by release_date (newest first), reusing the existing header-click sort wiring and the "newest" SORT_KEY. No new sort control — uses the existing column-header paradigm. Checks: node --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2) In the Serve tab the model is a specific GGUF file already on disk, so its quant can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K" as if you could re-quantize it. That's meaningless when serving a fixed file. - compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE mode), the quant is locked to the file's and profiles differ only in the real serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode (no override) still varies the quant to show download options. - /api/hwfit/profiles accepts serve_weights_gb & serve_quant. - The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from the repo/file name) and passes them, so profiles match what's actually served. Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k ncm15) — no nonsensical quant changes. Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor Two serve-panel additions: 1. **Vision toggle.** A "Vision" checkbox that serves the model with its multimodal projector so it can read images. The mmproj path is resolved at runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in the model folder makes the toggle just work; `--mmproj … --image-max-tokens 1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found. 2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s while the panel is open and shows VRAM used/total/%, free, and — crucially on a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint (previously read for total only and discarded for 'used'). Lets you see at a glance whether a config fits VRAM (fast) or is paging to system RAM over PCIe (slow) instead of guessing. Checks: node --check + py_compile pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Odysseus
─────────────────────────────────────────────── ⊹ ࣪ ˖ ૮( ˶ᵔ ᵕ ᵔ˶ )っ Odysseus vers. 1.0 ───────────────────────────────────────────────
A self-hosted AI workspace -- meant to be the self-hosted version of the UI experience you get from ChatGPT and Claude. But with more jank and fun. Running on your own hardware, with your own data -- local-first, privacy-first, and no trojan.
Features
- Chat -- chat with any local model or API; adding them is super simple.
vLLM · llama.cpp · Ollama · OpenRouter · OpenAI - Agent -- hand it tools and let it run the whole task itself.
built on opencode · MCP · web · files · shell · skills · memory - Cookbook -- Scans your hardware, recommends models, click to download and serve.. easy!
built on llmfit · VRAM-aware · GGUF / FP8 / AWQ · fit scoring · vLLM / llama.cpp serving - Deep Research -- multi-step runs that gather, read, and synthesize sources into a nice visual report.
adapted from Tongyi DeepResearch - Compare -- a fun tool to compare models side by side. Test completely blind, no bias!
multi-model · blind test · synthesis - Documents -- YOU write the text, AI is there to assist, not the opposite.
multi-tab editor · markdown · HTML · CSV · syntax highlighting · AI edits · suggestions - Memory / Skills -- Persistent memory and skills, your agent evolves over time as it better understands you and your tasks!
ChromaDB · fastembed (ONNX) · vector + keyword retrieval · import/export - Email -- IMAP/SMTP inbox with AI triage built in: urgency reminders, auto-tag, auto-summary, auto-reply drafts, auto-spam.
IMAP · SMTP · per-account routing · CalDAV-aware - Notes & Tasks -- Quick notes with reminders, a todo list, and scheduled tasks the agent can act on.
note pings · checklist · cron-style tasks · ntfy / browser / email channels - Calendar -- Local-first calendar with CalDAV sync to Radicale / Nextcloud / Apple / Fastmail.
CalDAV pull · .ics import/export · per-calendar colors · agent-aware - Works on mobile -- looks and runs great on your phone, not just desktop.
responsive · installable (PWA) · touch gestures - Extras -- more to explore, happy if you give it a go!
image editor · theme editor · file uploads (vision + PDF) · web search · presets · sessions · 2FA
Demo
A full, hover-to-play tour lives on the landing page (docs/index.html).
Quick Start
Defaults work out of the box: clone, run, then configure models/search/email
inside Settings. Only edit .env for deployment-level overrides like
APP_BIND, APP_PORT, AUTH_ENABLED, DATABASE_URL, or a pre-seeded admin password.
On first setup, Odysseus creates an admin account (admin unless
ODYSSEUS_ADMIN_USER is set) and prints a temporary password in the terminal.
For Docker installs, the same line is in docker compose logs odysseus.
Use that for the first login, then change it in Settings.
Contributing? See CONTRIBUTING.md for setup, testing, and pull request guidelines.
Docker (recommended)
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
cp .env.example .env # optional, but recommended for explicit defaults
docker compose up -d --build
Open http://localhost:7000 when the containers are healthy. Docker Compose
binds the web UI to 127.0.0.1 by default. If the port is taken, set
APP_PORT=7001 in .env and recreate the container. Set APP_BIND=0.0.0.0
only when you intentionally want LAN/reverse-proxy access.
Native Linux / macOS
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
Requirements: Python 3.11+. Cookbook also needs tmux for background model
downloads and serves. Use --host 0.0.0.0 only when you intentionally want
LAN/reverse-proxy access.
Apple Silicon
Docker on macOS cannot use the Metal GPU. For GPU-accelerated Cookbook on an M-series Mac, run Odysseus natively:
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
./start-macos.sh
It launches at http://127.0.0.1:7860. To expose it to your phone over a trusted LAN/VPN such as Tailscale, bind all interfaces:
ODYSSEUS_HOST=0.0.0.0 ./start-macos.sh
# then open http://<tailscale-ip>:7860
Keep auth enabled when binding outside loopback, and do not expose this port directly to the public internet. To build a clickable app wrapper:
./build-macos-app.sh
Cookbook, GPU, Ollama, and troubleshooting notes
Docker bundled services. Compose starts Odysseus, ChromaDB, SearXNG, and
ntfy. Odysseus and the bundled service ports bind to 127.0.0.1 by default, so
they are reachable from the host but not exposed to your LAN/public internet
unless you opt in.
Cookbook storage in Docker. Downloads live in ./data/huggingface
(~/.cache/huggingface in the container). Cookbook-installed Python CLIs and
serve engines live in ./data/local (~/.local in the container), so they
survive container recreation.
Remote servers. In Cookbook -> Settings -> Servers, generate the
Odysseus SSH key and add the public key to the remote server's
~/.ssh/authorized_keys. From the host you can also run:
ssh-copy-id -i data/ssh/id_ed25519.pub user@server
NVIDIA Docker GPU overlay. CPU-only users can skip this section.
scripts/check-docker-gpu.sh diagnoses GPU passthrough and can optionally
install the host runtime or update .env. Cookbook can only detect GPUs that
Docker exposes to the container — if the host runtime is not configured,
Cookbook sees the iGPU, another card, or CPU instead of your NVIDIA GPU.
# Read-only diagnostic (default — installs nothing, never edits .env):
scripts/check-docker-gpu.sh
# Print OS-specific install commands without running them:
scripts/check-docker-gpu.sh --print-install-commands
# Install NVIDIA Container Toolkit on Ubuntu/Debian (requires sudo):
scripts/check-docker-gpu.sh --install-nvidia-toolkit
# Write COMPOSE_FILE to .env (only when GPU passthrough is confirmed working):
scripts/check-docker-gpu.sh --enable-nvidia-overlay
# Full assisted setup — install toolkit, then enable overlay if passthrough works:
scripts/check-docker-gpu.sh --install-nvidia-toolkit --enable-nvidia-overlay
Safety notes:
- The app never installs host GPU runtime automatically.
- The app never edits
.envautomatically. .envis only modified when--enable-nvidia-overlayis explicitly passed, and only after GPU passthrough succeeds.--yesskips prompts but does not bypass the passthrough gate..env.bak.*backups created by--enable-nvidia-overlayare ignored by Git and the Docker build context.
To enable manually without the script, add this to .env:
COMPOSE_FILE=docker-compose.yml:docker/gpu.nvidia.yml
AMD / ROCm. AMD GPU passthrough is not automated. Add manually:
COMPOSE_FILE=docker-compose.yml:docker/gpu.amd.yml
Verify after enabling either overlay:
docker compose exec odysseus nvidia-smi -L # NVIDIA
docker compose exec odysseus rocm-smi # AMD
GPU passthrough ≠ llama.cpp CUDA.
nvidia-smipassing inside the container confirms Docker GPU access, but llama.cpp also needscudartand the CUDA Toolkit at runtime. If Cookbook logs showUnable to find cudart library,Could NOT find CUDAToolkit,CUDA Toolkit not found, or tensors/layers assigned to CPU, that is a Cookbook/llama.cpp build issue — not a Docker passthrough failure. Re-install the serve engine via Cookbook → Dependencies to get a CUDA-enabled build.
Ollama with Docker. If Ollama runs on the host, add this endpoint in Settings:
http://host.docker.internal:11434/v1
Ollama must listen outside its own loopback interface:
OLLAMA_HOST=0.0.0.0:11434 ollama serve
Useful checks.
docker compose ps
docker compose logs --tail=120 odysseus
docker compose logs odysseus | grep -E 'ChromaDB|MemoryVectorStore|DEGRADED'
macOS details. start-macos.sh installs Homebrew deps, creates the venv,
runs setup, and starts uvicorn on port 7860 because AirPlay often holds
7000. It uses llama.cpp/Ollama for Metal. vLLM/SGLang are CUDA/ROCm-only and
do not run on macOS. MLX-only models are not served by Odysseus.
Native Windows
One-command launcher (creates the venv, installs deps, runs setup, starts the server; safe to re-run):
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
powershell -ExecutionPolicy Bypass -File .\launch-windows.ps1
Or do it by hand:
git clone https://github.com/pewdiepie-archdaemon/odysseus.git
cd odysseus
python -m venv venv
venv\Scripts\Activate.ps1
pip install -r requirements.txt
python setup.py
python -m uvicorn app:app --host 127.0.0.1 --port 7000
Requirements: Python 3.11+. The core app (chat, agent, memory, documents,
email, calendar, deep research) runs fully native. For full Cookbook background
model downloads and the agent shell tool, also install
Git for Windows (provides bash.exe).
Local GPU serving of vLLM/SGLang needs Linux/WSL2; for a local model on Windows,
Ollama is the easiest path — point Odysseus at
http://localhost:11434/v1 in Settings.
Open http://localhost:7000, log in with the generated admin password,
and configure everything else inside Settings.
Security Notes
Odysseus is a self-hosted workspace with powerful local tools: shell access, file uploads, model downloads, web research, email/calendar integrations, and API tokens. Treat it like an admin console.
- Keep
AUTH_ENABLED=truefor any network-accessible deployment. - Do not expose it directly to the public internet without HTTPS and a trusted reverse proxy.
- Keep
data/,.env, logs, databases, and uploaded/generated media out of Git. They are ignored by default. - Review
data/auth.jsonafter first boot: disable open signup unless you intentionally want it, make only your own account admin, and keep demo/test accounts non-admin. - Non-admin users do not get shell/Python/file read/write by default, and admin-only routes/tools such as MCP management, API tokens, webhooks, model/cookbook serving, backup/vault, and app settings are admin-gated. Other features are controlled by per-user privileges, so review each user's privileges before exposing a deployment.
- Rotate any API keys or tokens that were ever pasted into a shared chat, demo, screenshot, or log.
- If you enable API tokens or webhooks, create separate tokens per integration and delete unused ones.
- Prefer binding manual development runs to
127.0.0.1; bind to0.0.0.0only when you intentionally want LAN/reverse-proxy access. - Before publishing a fork, run
git status --shortand confirm no private files from.env,data/,logs/, uploads, backups, or local databases are staged.
Putting it behind HTTPS
Odysseus serves plain HTTP on its port. That's fine for localhost and trusted LAN/VPN use, but browsers will warn ("Password fields present on an insecure page") and the login + API tokens travel in cleartext. For anything reachable outside your machine — including a Tailscale IP shared with other devices — put a TLS-terminating reverse proxy in front.
Shortest path with Caddy (auto-renews Let's Encrypt certs):
odysseus.example.com {
reverse_proxy localhost:7000
}
For a LAN-only Tailscale deployment, Caddy + tailscale-cert or the built-in MagicDNS HTTPS feature both work. nginx/Traefik configs are similar — proxy localhost:7000, terminate TLS at the proxy. Once that's in place, the browser warning goes away and your login is encrypted.
Contributing
Help is welcome. The best entry points are fresh-install testing, provider setup bugs, mobile/editor polish, docs, and small focused refactors. See ROADMAP.md for the current help-wanted list.
Configuration
Most setup is done inside the app with /setup or Settings. Use .env
for deployment-level defaults and secrets you want present before first boot.
Key settings:
| Variable | Default | Description |
|---|---|---|
LLM_HOST |
localhost |
Your LLM server (e.g. llm-host.local:8000) |
LLM_HOSTS |
-- | Comma-separated list for model discovery |
OPENAI_API_KEY |
-- | Optional OpenAI key. Prefer adding providers in the app unless pre-seeding. |
SEARXNG_INSTANCE |
http://localhost:8080 |
SearXNG URL. Docker overrides this to http://searxng:8080. |
SEARXNG_SECRET |
generated on first Docker boot | Optional SearXNG cookie/CSRF secret. Leave blank unless you need to pin it. |
APP_BIND |
127.0.0.1 |
Docker Compose host bind address for the web UI. Use 0.0.0.0 only for intentional LAN/reverse-proxy access. |
APP_PORT |
7000 |
Docker Compose host port for the web UI. |
AUTH_ENABLED |
true |
Enable/disable login |
LOCALHOST_BYPASS |
false |
Development-only auth bypass for loopback requests. Keep false for shared/network deployments. |
DATABASE_URL |
sqlite:///./data/app.db |
Database connection string |
CHROMADB_HOST |
localhost |
ChromaDB host for vector memory. Docker overrides this to chromadb. |
CHROMADB_PORT |
8100 |
ChromaDB port for manual host runs. Docker overrides this to 8000. |
EMBEDDING_URL |
-- | OpenAI-compatible embeddings endpoint |
Built-in MCP servers (optional setup)
Odysseus auto-registers a few built-in MCP servers at startup. The npx-based ones (currently the browser server, @playwright/mcp) only start when their npm package is already in the local npx cache. If a package isn't cached, that server is skipped with a startup log message explaining what to do, so a fresh install does not block on a multi-minute npm download or hang if Playwright system deps are missing.
To enable the browser MCP (page navigation, screenshots, vision), run once:
npx -y @playwright/mcp@latest --version
That installs @playwright/mcp plus Playwright (~300MB total). Restart Odysseus and the server will register at startup.
Architecture
app.py # FastAPI entry point
core/ auth, database, middleware, constants
src/ llm_core, agent_loop, agent_tools, chat_processor, search/
routes/ chat, session, document, memory, model … endpoints
services/ docs, memory, search, hwfit (Cookbook) …
static/ index.html + app.js + style.css + js/ (modular front-end)
docs/ landing page (index.html) + preview clips
Data
All user data lives in data/ (gitignored): app.db (sessions, messages, documents),
memory.json, presets.json, uploads/, personal_docs/, chroma/, settings.json.
Star History
License
MIT -- see LICENSE and ACKNOWLEDGMENTS.md.
|
|||
|||||
| | | |||||||
)_) )_) )_) ~|~
)___))___))___)\ |
)____)____)_____)\\|
_____|____|____|_____\\\__
\ /
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~
~^~ all aboard! ~^~
~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~~^~^~





