This persists work that had been living only in the cookbook docker
container's writable layer — never committed to the host source. Brought
back to git intact, app.py registration re-applied surgically on top of
current main (not the older container copy, which would have regressed
the Windows MIME fix, asynccontextmanager lifespan, and webhook auth
exempts).
routes/codex_routes.py (new):
- GET /api/codex/capabilities — what this Odysseus exposes.
- GET /api/codex/plugin.zip — downloads integrations/codex as a zip.
- GET /api/codex/todos — scope-gated todos:read|write.
- POST /api/codex/todos — scope-gated todos:write.
- GET /api/codex/emails — scope-gated email:read|draft|send.
- GET /api/codex/emails/{uid} — single-message fetch.
- _scope_owner() enforces api_token scopes before touching user data.
routes/api_token_routes.py (+103 lines):
- Adds Codex-token-specific issuance + revocation paths.
integrations/codex/ (new bundle, shipped via /api/codex/plugin.zip):
- README.md — install instructions.
- .codex-plugin/plugin.json — Codex plugin manifest.
- scripts/odysseus_api.py — Python client used by the skill.
- skills/odysseus/SKILL.md — Codex skill definition.
static/js/settings.js (+253 lines):
- New "Codex Agent" option in the Integrations dropdown.
- Add / edit panel with plugin-bundle download link + curl-with-token
install instructions per agent.
app.py:
- 7-line surgical change: capture email_router = setup_email_routes()
and register setup_codex_routes(email_router=email_router) after the
email module so the Codex routes can borrow its helpers.
This persists work that had been living only in the cookbook docker
container's writable layer — never committed to the host source. Brought
back to git intact, app.py registration re-applied surgically on top of
current main (not the older container copy, which would have regressed
the Windows MIME fix, asynccontextmanager lifespan, and webhook auth
exempts).
routes/codex_routes.py (new):
- GET /api/codex/capabilities — what this Odysseus exposes.
- GET /api/codex/plugin.zip — downloads integrations/codex as a zip.
- GET /api/codex/todos — scope-gated todos:read|write.
- POST /api/codex/todos — scope-gated todos:write.
- GET /api/codex/emails — scope-gated email:read|draft|send.
- GET /api/codex/emails/{uid} — single-message fetch.
- _scope_owner() enforces api_token scopes before touching user data.
routes/api_token_routes.py (+103 lines):
- Adds Codex-token-specific issuance + revocation paths.
integrations/codex/ (new bundle, shipped via /api/codex/plugin.zip):
- README.md — install instructions.
- .codex-plugin/plugin.json — Codex plugin manifest.
- scripts/odysseus_api.py — Python client used by the skill.
- skills/odysseus/SKILL.md — Codex skill definition.
static/js/settings.js (+253 lines):
- New "Codex Agent" option in the Integrations dropdown.
- Add / edit panel with plugin-bundle download link + curl-with-token
install instructions per agent.
app.py:
- 7-line surgical change: capture email_router = setup_email_routes()
and register setup_codex_routes(email_router=email_router) after the
email module so the Codex routes can borrow its helpers.
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
Backend (services/hwfit + routes):
- rank_models picks visible set by REQUESTED column, not always score —
sorting by Param now shows highest-param models PERIOD (incl. too_tight).
- New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang
cannot serve them); default non-prequantized to BF16 on 2+ GPUs.
- AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so
FP8 wins when both fit.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above
M2.5 on equal composite score; >=100B integers not misread as versions.
- /api/cookbook/hf-latest no longer drops models without an "NB" pattern in
the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered).
- Cached-model scan: atexit flushes models JSON even if the script is
killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s.
- KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New
"stalled" status for shells <1 MB with no .incomplete files.
- /api/cookbook/state POST guard: rejects "done" download tasks lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stops stale tabs from poisoning persisted state.
- hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization
Q4_K_M -> BF16 (it is the native base, not a quant).
Frontend (static/js):
- Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/
50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score
on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit
column header tagged with active budget (RAM / GPU / N GPU).
- Foldable Download admin-card: the Download h2 is the chevron trigger;
state persists in localStorage.
- Download card surfaces destination dir (Dir: <path>). Same dir on running
task row, font/color matched to uptime (9px Fira Code muted, opacity .4).
- Serve panel ctx text input always resets to model max on open. Sub-MB
cached models show with red "download stalled" badge.
- Bulk-select Cancel + Delete reset the Select button label on exit.
- Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/
required; bare "Download complete" no longer marks the task done after
the first config file. Clear button now sends tmux kill-session too.
True overall % for multi-shard downloads: ((N-1)+frac)/total instead of
hf_transfer per-shard aggregate.
- Diagnosis card simplified: removed fold toggle, copy button, dismiss X.
Suggestion font matches message body (12px).
- HF token field flashes green check + "Saved" on save.
- Cached scan no longer counts stalled rows as downloaded in Scan/Download.
CSS:
- dep Install button width pinned to 76px to match Installed split.
- task-sub row +1px; task-status badge gets margin-right 8px.
- Ctx slider styled like gallery editor sliders (thin pill rail, red thumb).
- Bulk-select cancel button top -3px -> -5px.
Allow Gmail quote attribution parsing to handle standard US weekday/month/day/year comma patterns while preserving existing formats, with JS regression coverage.
The sidebar delete handler fired the DELETE API call without awaiting
it, then called loadSessions() which re-fetches the session list from
the server. If the server hadn't processed the deletion yet, the
session reappeared in the sidebar immediately after being removed.
Await the DELETE response before reloading so the server-side deletion
completes first.
Fixes#1358
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When submitting a message without a model/session configured, the
error path showed a help message but never cleared the textarea,
leaving the user's text stuck in the input field. Clear the input
and trigger autoResize on both the no-default-model and catch paths.
Fixes#1475
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After an AI-written document is closed, its session_id is nulled (the detach
behaviour from #1238). Both Open controls in the Documents library — the card's
expanded Open button and the card dropdown's Open item — gated on
`doc.session_id`: they wired `libraryOpenInSession` (which early-returns with no
session) and DISABLED the control otherwise, so the user's own document showed a
grayed-out Open button and couldn't be reopened.
The module already has `libraryOpenDocument`, which explicitly handles the
orphaned case ("just open in editor without switching session" -> _loadDocument
by id). Route the no-session path there instead of disabling.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The serve bootstrap builds llama-server from source only when it is missing
from PATH, so a host that first compiled CPU-only (no nvcc present at build
time) reuses that CPU-only binary on every later serve and never gets a GPU
build, even after a CUDA/ROCm toolkit is installed. There was no UI lever to
force a rebuild.
Adds a 'Rebuild llama.cpp' button to the Cookbook Dependencies tab. It clears
the cached ~/bin/llama-server symlink and ~/llama.cpp/build directory (locally
or on the selected remote server) so the next serve recompiles and picks up
CUDA/HIP if a toolchain is now present. It installs and downloads nothing.
- routes/cookbook_helpers.py: _llama_cpp_rebuild_cmd() (single source of truth)
- routes/shell_routes.py: POST /api/cookbook/rebuild-engine (admin-only, reuses
the existing SSH plumbing for remote hosts)
- static/js/cookbook.js: header button + handler honoring the deps server selector
- tests: cover the command shape and a clean run on a fresh HOME
Motivated by #831 (RTX 4070 user stuck on a CPU-only build with no way to
re-trigger the build).
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
Installing a heavy dependency like vllm crashes in a "stale — restarting" loop:
it restarts mid-install, reuses the cached wheels, then stalls again.
The download/install watchdog (cookbookRunning.js) keyed its stall signal purely
off the downloaded-byte counter ("1.81G/2.49G"). A dependency install spends long
stretches with NO byte counter — pip dependency resolution and the native CUDA
build/compile — so the signal froze and after STALE_PROGRESS_MS the watchdog
declared it stale and auto-restarted it mid-build, looping forever.
Extract the signal into a pure computeProgressSignal (cookbookProgressSignal.js):
keep the byte counter for the download phase (so a genuinely stuck download is
still caught, and an animating-but-frozen ETA frame is NOT mistaken for progress),
and when there's no byte counter fall back to a fingerprint of the output tail so
resolver/compile lines count as progress. Only a truly frozen tail now reads as
stalled.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_getCharacterList() had two bugs that silently dropped every
user-created persona from the group participant picker:
1. The /api/presets/templates endpoint returns a JSON array directly,
but the code read `data.templates` (always undefined). The forEach
over `data.templates || []` iterated over an empty array every time,
so no user templates were ever added.
2. Even if the array had been read correctly, the `t.isCharacter` guard
would have filtered them all out — user templates are saved by
presets.js without that flag, which is only present on built-in
PROMPT_TEMPLATES entries.
Fix: accept both the direct-array and the {templates:[]} shapes, drop
the isCharacter guard (user_templates are personas by definition), and
use the correct field name (system_prompt, not prompt) so the character
prompt actually reaches the group chat.
Fixes#1656
Two related bugs in the Cookbook task lifecycle:
1. "Stop all" fired kills via .click() inside a synchronous forEach but
showed the success toast immediately after — the toast appeared before
any of the async kill requests had been sent, giving the user false
confidence the tasks were stopped.
2. The download auto-retry logic (triggered when DOWNLOAD_FAILED appears
in the task output) had no way to distinguish a network interruption
from a deliberate user stop. A download stopped via "Stop all" or the
individual Stop button could be silently restarted up to two times by
the background monitor.
Fix: persist _userStopped: true to localStorage at the moment the user
clicks Stop (individually) or Stop all. The auto-retry guard checks this
flag before relaunching the download. The flag is written BEFORE the
kill requests fire so there is no window where the monitor can race.
Fixes#1458
When the scheduled folder is opened with cached data, sp is null
(the loading spinner is skipped). _loadScheduled receives null and
calls sp.destroy() unconditionally, crashing with TypeError.
Both _getModels() and getAllModels() store the sorted copy in a cache
variable but return the original unsorted array on first invocation.
Subsequent calls return the cache (sorted), causing inconsistent
model picker ordering on first render.
A CPU-only llama.cpp serve config still emitted --flash-attn on and exported
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto
profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed
to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is
unchanged — the gate only affects the ngl=0 path.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
deleteMessage() bailed at `if (!sessionId) return;`, so the "x" on an output
shown before a model/API was selected did nothing — there's no session yet
(issue #1428). The session id is only needed for the server-side delete; without
one (or with no persisted message ids) we now fall through to removing the DOM,
so the "x" always at least dismisses the bubble.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
uploadPending() read `data.files` from /api/upload without checking `res.ok`, so
a non-OK response (429 rate limit, 413 too large, …) was swallowed: the pending
files vanished and the chat sent with no attachments and no feedback — part of
why the model "didn't even see them" in #1346.
Check res.ok; on failure show the server's reason via a toast and keep the
pending files so the attach strip re-renders for a retry (matching the existing
"restored on error" comment that the code never actually honored).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clicking "New chat" (the brand/welcome navigation path) left the previous
session's unsent draft in the composer (issue #1343). The direct model-picker
path (createDirectChat) already cleared it, but the welcome path did not.
Clear `#message` in chatRenderer.showWelcomeScreen() — the shared entry point
for that state — resetting its autosized height and dispatching an `input` event
so the send button / autosize listeners update. Switching between existing
sessions loads them directly and does not call showWelcomeScreen, so genuine
drafts are not erased.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Cookbook download path showed its error toasts with the default ~1.2s
duration, so an actionable message like "tmux is required for Cookbook
background downloads/serves … install it with your OS package manager" vanished
before it could be read (issue #1355). The serve path already uses multi-second
durations.
Give the three "Download failed" toasts a 9s duration to match.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: Cookbook local GGUF serving inside Docker
Cookbook’s in-container GGUF serve flow had multiple Docker-specific breakages that made local llama.cpp models fail or register against the wrong endpoint.
Fixes included here:
use the scanned model cache root when generating GGUF serve commands instead of hardcoding $HOME/.cache/huggingface/hub
fix malformed llama.cpp preflight build lines that generated invalid bash in serve runner scripts
preserve loopback model URLs inside Docker when the target port is already reachable from the Odysseus container, instead of rewriting them unconditionally to host.docker.internal
Before this change, Docker local serves could fail in several ways:
Cookbook pointed llama.cpp at the wrong GGUF path
generated serve runner scripts crashed before launch with a shell syntax error
successfully started in-container model servers were auto-registered as host.docker.internal: instead of localhost/127.0.0.1
This makes the Docker Cookbook path work as expected for: downloaded GGUF -> local llama.cpp serve -> endpoint registration
* test: add test for docker-local endpoint rewrites
* fix: markdown table renders separator row as visible data
The alignment separator (|---|---|) at row index 1 was rendered as a
<td> row with dashes as cell content. Skip it and only open <tbody>
at that point, so tables render as header + data without the garbage
separator row in between.
* test: add regression test for table separator row rendering
Verifies that the markdown table renderer skips the separator row
(|---|---|) instead of rendering it as a visible data row. Also
updates the test harness to handle the splitTableRow import.
_createChip called URL.createObjectURL directly, bypassing the
_getPreviewUrl/_revokePreviewUrl cache. Each re-render of the
attachment strip leaked blob URLs that were never revoked.