Add a calendar-driven scheduler so a user can pick a model in Cookbook, click "Schedule…" instead of "Launch", choose time windows + days of the week + (optional) end date, and have Odysseus auto-launch the serve when the window starts and hard-kill it when the window ends. The calendar IS the source of truth — events on a designated calendar are interpreted as serve schedules, so editing the event in the calendar UI immediately changes the schedule.
Whole feature is gated by setting `cookbook_scheduler_enabled` (default False). Disabling the setting silences the reconciler and the API refuses requests; setting + three new files = entire surface, easy to revert.
New files:
- src/cookbook_scheduler.py — background reconciler: ticks every 60s, reads next ±90s of calendar events on the designated calendar, launches/kills serves to match. Honors "refuse if GPUs busy" (skips with reason, no retry). Adopts pre-existing manual serves matching the event's model so window-end cleanup still applies. Tags scheduler-owned tasks with `_scheduledBy: <event_uid>` so it never kills serves it doesn't own.
- routes/cookbook_schedule_routes.py — POST /api/cookbook/schedule/from-cookbook builds RRULE+ICS events from the modal's input (model, slots[], days[], until). GET /upcoming returns the next 24h with per-event status (scheduled / running / adopted / skipped / failed / ended) for the UI. POST /reconcile-now manually kicks the reconciler.
- static/js/cookbookSchedule.js — Schedule button click handler + modal. Daily/hourly time slot picker, multi-slot ("+ add another time slot"), weekday chips with Weekdays/Weekend/Every-day quicksets, optional Until date. Calls /from-cookbook on save. Whole module is a single IIFE; deleting the file plus its <script> tag removes the UI surface.
Existing files touched (minimal):
- app.py: register the new router + add the reconcile loop as a startup task (~10 lines, all in one block). Reconcile loop checks the feature flag on every tick, so leaving it running with the flag off costs ~one settings lookup per minute.
- static/index.html: one new <script> tag for cookbookSchedule.js.
- static/js/cookbookServe.js: add a "Schedule…" button next to the existing Launch button. Hidden by default; cookbookSchedule.js reveals it after confirming the feature flag is on.
- static/style.css: ~80 lines for the modal styles (mobile-aware via @media).
User choices baked in:
- Calendar events are the source of truth.
- Refuse to launch if GPUs busy (skip + log reason in scheduler.events[uid].reason).
- Hard kill at event end.
- No retry on a skipped event within the window.
- Multi-slot per day supported (one calendar event per slot, shared RRULE).
- Pre-existing manual serves get adopted at window start so they're killed at end.
Known follow-ups (not in this commit):
- Settings UI to pick the schedule calendar + toggle the feature flag.
- Calendar event color/badge for status (running/skipped/failed).
- "Lazy launch on first request" — currently launches at event start. Replacing _launch_serve with a proxy that defers vllm until the first chat request is a contained future change.
Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:
* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).
Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.
`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.
Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.
Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.
`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.
Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
* fix(cookbook): stop-all no longer auto-retries interrupted HF downloads
When C-c was sent to a running download, the bash wrapper printed
DOWNLOAD_FAILED on non-zero exit (SIGINT = 130). The reconnect polling
loop was still running at that point, saw the failure marker, and
silently relaunched the download — making "Stop all" appear to have no
effect while the UI showed the toast as if it succeeded.
Fix: abort the reconnect controller immediately when the stop button is
clicked (before the kill command is dispatched), and guard the
auto-retry condition with !controller.signal.aborted so that any
in-flight poll that completes after abort cannot trigger a retry.
Fixes#1458
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix Edge/Chromium sidebar section-title clipping (#1420)
Sidebar section titles were vertically clipped in Chromium/Edge (fine in
Firefox). Raise line-height 1 → 1.3, mirroring the existing .list-item fix.
The titles are flex-centred in a fixed-height (29px) header, so this adds
glyph headroom without any reflow.
* Drop GPU-only flags from the CPU-only (-ngl 0) serve command (#1433)
A CPU-only llama.cpp serve config still emitted --flash-attn on and exported
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto
profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed
to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is
unchanged — the gate only affects the ngl=0 path.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: APIKeyManager.load crashes app startup on a corrupt/wrong-shape api_keys.json (#1565)
* Don't lose deep-research findings when synthesis times out (#1551) (#1562)
Two problems made deep research report "No information could be gathered" even
after it had extracted findings, on slow local models (reporter served a 20B
via LM Studio):
- _synthesize hard-capped its LLM call at timeout=60, while extraction uses the
user's extraction_timeout (300s here) and the final report uses 180s. The slow
model needed >60s to synthesize the round's findings, so synthesis timed out
after 3 attempts. Raised it to 180s to match the final-report call.
- When synthesis produced no report (it returns the unchanged, still-empty
report on failure during round 1), the run hit
`if not report: return "No information could be gathered…"` and discarded the
findings it had already gathered. Now it falls back to a compiled report built
from those findings (_fallback_report) so the user keeps the gathered material.
Tests stub the LLM (no live model/DB), pin the synthesis timeout >= 180, that the
fallback surfaces the findings rather than the give-up message, and that a failed
synthesis preserves the previous report.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: return sorted model list on first call in group chat (#1484)
Both _getModels() and getAllModels() store the sorted copy in a cache
variable but return the original unsorted array on first invocation.
Subsequent calls return the cache (sorted), causing inconsistent
model picker ordering on first render.
* fix: guard sp.destroy() in _loadScheduled against null spinner (#1495)
When the scheduled folder is opened with cached data, sp is null
(the loading spinner is skipped). _loadScheduled receives null and
calls sp.destroy() unconditionally, crashing with TypeError.
* fix: capture download exit code before test consumes it (#1497)
The shell pattern 'if [ $? -eq 0 ]; ... else ... echo DOWNLOAD_FAILED (exit $?)' always reports 'exit 1' because $? inside the else branch is the exit code of the [ test command, not the download. Capture into _ec first.
* fix: guard uid.decode() in auto-classify warning log against str UIDs (#1472)
Every other uid.decode() call in this function uses
'uid.decode() if isinstance(uid, bytes) else str(uid)' but the
warning at line 832 does bare uid.decode(), crashing with
AttributeError when uid is already a string.
* fix: guard AI tidy verdict against non-string LLM output (#1486)
The AI document-tidy endpoint parses verdicts from LLM JSON output
and calls .lower().strip() directly. If the model returns null or a
non-string element, this crashes with AttributeError. Coerce to str
so malformed output is treated as 'keep' instead of crashing.
* fix: rename local url-quote import to avoid shadowing module-level _q (#1471)
The 'from urllib.parse import quote as _q' at line 734 shadows the
module-level _q (istrstrstrstrstrstrIMAPutility) imported from email_helpers, causing
UnboundLocalError at lines 191 and 278 where _q is used before the
local import executes. This silently breaks the entire auto-summarize
pass.
* fix(ui): add missing Escape key handlers for email-lib-modal, model-picker-menu, and sort dropdowns (#1487)
CONTEXT: Several interactive elements lacked Escape key handlers: the email library modal was not in dynamicModals, the model-picker popup had no Escape close, and the session/model sort dropdowns only closed on outside click.
CHANGE: Adds email-lib-modal to the dynamicModals array in the Escape handler so it gets dismissed via dismissModal. Adds a check for model-picker-menu.open before the modal chain to close the dropdown on Escape. Adds checks for session-sort-dropdown and model-sort-dropdown display=block before the document panel minimize fallback.
WHY: Users expect consistent Escape-to-close behavior across all modals, overlays, and popups. These four were the only interactive containers in the app that ignored the Escape key entirely.
IMPACT: Pressing Escape now closes the email library modal, model picker popup, session sort dropdown, and model sort dropdown -- matching user expectations and the behavior of every other modal in the app.
* fix: mcp CLI _serialize crashes when stored env JSON is a list (#1609)
* fix: validate_caldav_url crashes with TypeError on a non-string URL (#1608)
* fix: _sanitize_export_filename crashes on a non-string session name (#1607)
* fix: shared MCP truncate() crashes on None/non-string tool output (#1605)
* fix: search query helpers crash on a non-string query (#1604)
* fix: rag_server add/remove_directory crashes on a non-string directory arg (#1614)
* fix: gallery CLI image serialization crashes on a non-string prompt (#1598)
* fix: research CLI summary crashes on a non-string query (#1596)
* fix: skills CLI summary crashes on a non-string description (#1595)
* fix(cookbook): set UTF-8 encoding for detached download/serve subprocesses (#1599)
On Windows, Python defaults to the active code page (cp1252) for
subprocess I/O. HuggingFace CLI outputs U+2713 (✓) when validating
tokens, which cp1252 cannot encode, crashing the download process.
Set PYTHONUTF8=1 and PYTHONIOENCODING=utf-8 in the subprocess
environment so Unicode output from hf/pip/llama-server is handled
correctly.
Fixes#1543
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: clarify host Ollama with Docker (#1594)
* fix(ui): stop welcome-screen tip from clipping on narrow phones (#1612)
The empty-state tip ("Add an AI endpoint from Settings...") shares a 60px
max-height ceiling with the one-line .welcome-sub / .welcome-version. On
narrow phones the welcome block shrink-wraps and the tip wraps to 4-5 lines
(~67px), so the shared ceiling clipped its last line ("...key into the
chat.") - the only setup hint a first-run user gets.
Give .welcome-tip its own taller max-height (120px), placed above the
@media (max-height: 650px) block so that rule's max-height:0 still collapses
the tip on short viewports. .welcome-sub / .welcome-version are untouched,
and desktop is unchanged (the tip is ~50px there, well under the ceiling).
* Save only string personal doc paths (#1566)
* Reject backup output inside data dir (#1587)
* Parse all AMD GPU check args (#1586)
* Require runnable dispatcher subcommands (#1585)
* Require runnable dispatcher subcommands
* Use modern dispatcher test loader
* Remove duplicate update database body (#1584)
* Skip invalid research service sources (#1583)
* Reject CalDAV writeback events without uid (#1582)
* Reject empty mail CLI recipients (#1581)
* Reject empty mail CLI recipients
* Keep mail CLI test imports isolated
* Validate signature CLI PNG data (#1580)
* Validate signature CLI PNG data
* Keep signature CLI test imports isolated
* Reject invalid preset CLI entries (#1579)
* Reject invalid preset CLI entries
* Use modern preset CLI test loader
* Normalize session CLI counters (#1578)
* Normalize session CLI counters
* Keep sessions CLI test imports isolated
* fix: monthly schedule label shows 21th/22th/31th (ordinal suffix for days >20) (#1577)
* fix: split_chunks emits a duplicate trailing chunk for text over size-overlap (#1573)
* fix: builtin_actions heuristics crash on a truthy non-string input (#1639)
* fix: skill test-task / precision helpers crash on a non-dict skill (#1638)
* fix: logs CLI _resolve crashes on a non-string name (#1631)
* fix: _extract_skill_json crashes on a truthy non-string teacher response (#1630)
* fix: tool-block parsing crashes on a non-string input (#1628)
* fix: check_outbound_url crashes on a truthy non-string URL (#1623)
* fix: document_actions title/content helpers crash on non-string input (#1621)
* fix: inside_base_dir raises TypeError on a non-string path instead of failing closed (#1619)
* fix: is_markitdown_format crashes on a non-string path (#1618)
* Close app_api blocklist gap for bare /api/tokens and /api/users
The blocklist prefixes had trailing slashes, so path.startswith() only
matched /api/tokens/{id} but not /api/tokens itself — the bare GET (list)
and POST (mint) endpoints were reachable via app_api. Same gap on
/api/users (list/create/delete). Drop trailing slashes so both bare and
sub-resource forms are blocked. /api/auth and /api/admin had no bare
endpoints today but get the same treatment to prevent future drift.
Caught by #1462.
* Decrypt CalDAV password before write-back (#1731)
writeback_event read cfg["password"] (the encrypted blob) and passed it
straight to DAVClient, so every local create/edit/delete authenticated
with the literal ciphertext, the remote rejected it, and the change
never reached the server — the exact silent-write-loss this module was
built to prevent. The pull path src/caldav_sync.py already decrypts;
mirror that. decrypt() is a no-op on legacy plaintext.
Caught by #1731.
* Memory MCP delete: match exact id, not prefix (#1303)
The delete action looked up the target with startswith() to capture
full_id, but then re-applied startswith() to filter the list — so a
short or ambiguous memory_id silently deleted every memory whose id
shared the prefix, while the success message reported only the first
match. The edit action used the first match and stopped, so the two
actions disagreed on multi-match behaviour. Use full_id for both.
Caught by #1303.
* Rebuild memory vector index from the full saved set, not just the audited owner (#1747)
audit_memories saves final_entries merged with other owners' entries
(correct), but then rebuilt the shared vector collection from
final_entries alone — wiping every other owner from semantic search
until they happened to run their own audit. Keyword fallback masked
it, so it degraded silently. Capture saved_entries once and rebuild
from that.
Caught by #1747.
* Owner-scope RAG doc ids so identical chunks across users don't collide (#1738, #1760)
_generate_doc_id hashed only text. add_document / add_documents_batch
early-return when the id exists, so the second owner indexing a
byte-identical chunk hit the first owner's id, was silently dropped,
and never stored under their owner — their owner-filtered search then
quietly omitted it. Hash owner + text; empty owner reproduces the
legacy id, so the unowned/base index keeps existing ids and isn't
re-churned. Same-owner identical chunks still dedupe.
Caught by #1738 and #1760 (independent reports of the same bug).
* Removed duplicate definition of _preview_text()
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Zeus-Deus <100132710+Zeus-Deus@users.noreply.github.com>
Co-authored-by: lekt8 <lewistham9x@gmail.com>
Co-authored-by: Afonso Coutinho <afonso@omelhorsite.pt>
Co-authored-by: Paulo Victor Cordeiro <146781332+pvcordeiro@users.noreply.github.com>
Co-authored-by: Zarl-prog <asimjunaidi5u@gmail.com>
Co-authored-by: Wes Huber <wesleybaxterhuber@gmail.com>
Co-authored-by: .bulat <its.bulat@icloud.com>
Co-authored-by: Mahdi Salmanzade <mahdisalmanzadehasl@gmail.com>
Co-authored-by: red person <redpersoncoding@gmail.com>
Co-authored-by: pewdiepie-archdaemon <pewdiepie-archdaemon@users.noreply.github.com>
* fix: support large proxy model endpoint refresh
Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline.
Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails.
Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage.
* fix: preserve endpoint ping status semantics
Blind Compare anonymized the pane headers, but each pane still created a helper chat session named "[CMP] <real-model>" and GET /api/sessions returned the session's model field. So the sidebar and the session-list API let a user map "Model A" back to its real model before voting, defeating the blind test.
- Frontend (static/js/compare/index.js, panes.js): in blind mode, name helper sessions by their neutral slot ("[CMP] Model A") instead of the model, matching the existing blind pane labels.
- Backend GET /api/sessions (routes/session_routes.py): blank the model field for [CMP]-prefixed helper sessions via a new _public_model helper.
- Backend /api/compare/start (routes/compare_routes.py): name blind sessions by slot and withhold model_left/model_right/mapping from the blind response (revealed at /vote).
- Tests: tests/test_blind_compare_redaction.py.
Fixes#1285.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Claude Agent integration: AGENT_CONFIGS.claude, INTG_TYPES.claude,
setup_claude_routes + integrations/claude/ skill bundle. Wired in
app.py alongside the existing Codex integration; same scope-gated
/api/codex/* backend; agent form has new description so users know
it's setup for an external CLI, not an agent streamed inside Odysseus.
- Remove mark_email_boundaries action: not good enough yet. Stripped
from task UI, scheduler defaults, registry, tool schema, clear-cache
route. Added to RETIRED_HOUSEKEEPING_ACTIONS so existing rows + their
task_runs auto-purge on startup.
- Cookbook download reliability: "Reconnect" fix button in the crash
diagnosis runs _reconnectTask after probing has-session. 30s confirm
window before marking a download "done" — kills the Finished/Downloading
flicker when tmux briefly drops between captures.
- Mobile UX: tap anywhere on a note card body opens the editor;
Update button morphs to Archive when no text was edited; bell icon
accent-colored; chip-trashing notif pills fade so only the icon
rotates into the trash zone.
- Settings integrations: SVG-per-provider in email + API preset
dropdowns, custom drop-up-aware menus, accent sub-header icons
(IMAP/SMTP), consistent card styling between list + edit, contacts
Edit/Delete icons, agent form description copy.
This persists work that had been living only in the cookbook docker
container's writable layer — never committed to the host source. Brought
back to git intact, app.py registration re-applied surgically on top of
current main (not the older container copy, which would have regressed
the Windows MIME fix, asynccontextmanager lifespan, and webhook auth
exempts).
routes/codex_routes.py (new):
- GET /api/codex/capabilities — what this Odysseus exposes.
- GET /api/codex/plugin.zip — downloads integrations/codex as a zip.
- GET /api/codex/todos — scope-gated todos:read|write.
- POST /api/codex/todos — scope-gated todos:write.
- GET /api/codex/emails — scope-gated email:read|draft|send.
- GET /api/codex/emails/{uid} — single-message fetch.
- _scope_owner() enforces api_token scopes before touching user data.
routes/api_token_routes.py (+103 lines):
- Adds Codex-token-specific issuance + revocation paths.
integrations/codex/ (new bundle, shipped via /api/codex/plugin.zip):
- README.md — install instructions.
- .codex-plugin/plugin.json — Codex plugin manifest.
- scripts/odysseus_api.py — Python client used by the skill.
- skills/odysseus/SKILL.md — Codex skill definition.
static/js/settings.js (+253 lines):
- New "Codex Agent" option in the Integrations dropdown.
- Add / edit panel with plugin-bundle download link + curl-with-token
install instructions per agent.
app.py:
- 7-line surgical change: capture email_router = setup_email_routes()
and register setup_codex_routes(email_router=email_router) after the
email module so the Codex routes can borrow its helpers.
This persists work that had been living only in the cookbook docker
container's writable layer — never committed to the host source. Brought
back to git intact, app.py registration re-applied surgically on top of
current main (not the older container copy, which would have regressed
the Windows MIME fix, asynccontextmanager lifespan, and webhook auth
exempts).
routes/codex_routes.py (new):
- GET /api/codex/capabilities — what this Odysseus exposes.
- GET /api/codex/plugin.zip — downloads integrations/codex as a zip.
- GET /api/codex/todos — scope-gated todos:read|write.
- POST /api/codex/todos — scope-gated todos:write.
- GET /api/codex/emails — scope-gated email:read|draft|send.
- GET /api/codex/emails/{uid} — single-message fetch.
- _scope_owner() enforces api_token scopes before touching user data.
routes/api_token_routes.py (+103 lines):
- Adds Codex-token-specific issuance + revocation paths.
integrations/codex/ (new bundle, shipped via /api/codex/plugin.zip):
- README.md — install instructions.
- .codex-plugin/plugin.json — Codex plugin manifest.
- scripts/odysseus_api.py — Python client used by the skill.
- skills/odysseus/SKILL.md — Codex skill definition.
static/js/settings.js (+253 lines):
- New "Codex Agent" option in the Integrations dropdown.
- Add / edit panel with plugin-bundle download link + curl-with-token
install instructions per agent.
app.py:
- 7-line surgical change: capture email_router = setup_email_routes()
and register setup_codex_routes(email_router=email_router) after the
email module so the Codex routes can borrow its helpers.
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
Backend (services/hwfit + routes):
- rank_models picks visible set by REQUESTED column, not always score —
sorting by Param now shows highest-param models PERIOD (incl. too_tight).
- New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang
cannot serve them); default non-prequantized to BF16 on 2+ GPUs.
- AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so
FP8 wins when both fit.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above
M2.5 on equal composite score; >=100B integers not misread as versions.
- /api/cookbook/hf-latest no longer drops models without an "NB" pattern in
the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered).
- Cached-model scan: atexit flushes models JSON even if the script is
killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s.
- KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New
"stalled" status for shells <1 MB with no .incomplete files.
- /api/cookbook/state POST guard: rejects "done" download tasks lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stops stale tabs from poisoning persisted state.
- hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization
Q4_K_M -> BF16 (it is the native base, not a quant).
Frontend (static/js):
- Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/
50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score
on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit
column header tagged with active budget (RAM / GPU / N GPU).
- Foldable Download admin-card: the Download h2 is the chevron trigger;
state persists in localStorage.
- Download card surfaces destination dir (Dir: <path>). Same dir on running
task row, font/color matched to uptime (9px Fira Code muted, opacity .4).
- Serve panel ctx text input always resets to model max on open. Sub-MB
cached models show with red "download stalled" badge.
- Bulk-select Cancel + Delete reset the Select button label on exit.
- Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/
required; bare "Download complete" no longer marks the task done after
the first config file. Clear button now sends tmux kill-session too.
True overall % for multi-shard downloads: ((N-1)+frac)/total instead of
hf_transfer per-shard aggregate.
- Diagnosis card simplified: removed fold toggle, copy button, dismiss X.
Suggestion font matches message body (12px).
- HF token field flashes green check + "Saved" on save.
- Cached scan no longer counts stalled rows as downloaded in Scan/Download.
CSS:
- dep Install button width pinned to 76px to match Installed split.
- task-sub row +1px; task-status badge gets margin-right 8px.
- Ctx slider styled like gallery editor sliders (thin pill rail, red thumb).
- Bulk-select cancel button top -3px -> -5px.
Allow Gmail quote attribution parsing to handle standard US weekday/month/day/year comma patterns while preserving existing formats, with JS regression coverage.
The sidebar delete handler fired the DELETE API call without awaiting
it, then called loadSessions() which re-fetches the session list from
the server. If the server hadn't processed the deletion yet, the
session reappeared in the sidebar immediately after being removed.
Await the DELETE response before reloading so the server-side deletion
completes first.
Fixes#1358
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When submitting a message without a model/session configured, the
error path showed a help message but never cleared the textarea,
leaving the user's text stuck in the input field. Clear the input
and trigger autoResize on both the no-default-model and catch paths.
Fixes#1475
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After an AI-written document is closed, its session_id is nulled (the detach
behaviour from #1238). Both Open controls in the Documents library — the card's
expanded Open button and the card dropdown's Open item — gated on
`doc.session_id`: they wired `libraryOpenInSession` (which early-returns with no
session) and DISABLED the control otherwise, so the user's own document showed a
grayed-out Open button and couldn't be reopened.
The module already has `libraryOpenDocument`, which explicitly handles the
orphaned case ("just open in editor without switching session" -> _loadDocument
by id). Route the no-session path there instead of disabling.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The serve bootstrap builds llama-server from source only when it is missing
from PATH, so a host that first compiled CPU-only (no nvcc present at build
time) reuses that CPU-only binary on every later serve and never gets a GPU
build, even after a CUDA/ROCm toolkit is installed. There was no UI lever to
force a rebuild.
Adds a 'Rebuild llama.cpp' button to the Cookbook Dependencies tab. It clears
the cached ~/bin/llama-server symlink and ~/llama.cpp/build directory (locally
or on the selected remote server) so the next serve recompiles and picks up
CUDA/HIP if a toolchain is now present. It installs and downloads nothing.
- routes/cookbook_helpers.py: _llama_cpp_rebuild_cmd() (single source of truth)
- routes/shell_routes.py: POST /api/cookbook/rebuild-engine (admin-only, reuses
the existing SSH plumbing for remote hosts)
- static/js/cookbook.js: header button + handler honoring the deps server selector
- tests: cover the command shape and a clean run on a fresh HOME
Motivated by #831 (RTX 4070 user stuck on a CPU-only build with no way to
re-trigger the build).
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
Installing a heavy dependency like vllm crashes in a "stale — restarting" loop:
it restarts mid-install, reuses the cached wheels, then stalls again.
The download/install watchdog (cookbookRunning.js) keyed its stall signal purely
off the downloaded-byte counter ("1.81G/2.49G"). A dependency install spends long
stretches with NO byte counter — pip dependency resolution and the native CUDA
build/compile — so the signal froze and after STALE_PROGRESS_MS the watchdog
declared it stale and auto-restarted it mid-build, looping forever.
Extract the signal into a pure computeProgressSignal (cookbookProgressSignal.js):
keep the byte counter for the download phase (so a genuinely stuck download is
still caught, and an animating-but-frozen ETA frame is NOT mistaken for progress),
and when there's no byte counter fall back to a fingerprint of the output tail so
resolver/compile lines count as progress. Only a truly frozen tail now reads as
stalled.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_getCharacterList() had two bugs that silently dropped every
user-created persona from the group participant picker:
1. The /api/presets/templates endpoint returns a JSON array directly,
but the code read `data.templates` (always undefined). The forEach
over `data.templates || []` iterated over an empty array every time,
so no user templates were ever added.
2. Even if the array had been read correctly, the `t.isCharacter` guard
would have filtered them all out — user templates are saved by
presets.js without that flag, which is only present on built-in
PROMPT_TEMPLATES entries.
Fix: accept both the direct-array and the {templates:[]} shapes, drop
the isCharacter guard (user_templates are personas by definition), and
use the correct field name (system_prompt, not prompt) so the character
prompt actually reaches the group chat.
Fixes#1656
Two related bugs in the Cookbook task lifecycle:
1. "Stop all" fired kills via .click() inside a synchronous forEach but
showed the success toast immediately after — the toast appeared before
any of the async kill requests had been sent, giving the user false
confidence the tasks were stopped.
2. The download auto-retry logic (triggered when DOWNLOAD_FAILED appears
in the task output) had no way to distinguish a network interruption
from a deliberate user stop. A download stopped via "Stop all" or the
individual Stop button could be silently restarted up to two times by
the background monitor.
Fix: persist _userStopped: true to localStorage at the moment the user
clicks Stop (individually) or Stop all. The auto-retry guard checks this
flag before relaunching the download. The flag is written BEFORE the
kill requests fire so there is no window where the monitor can race.
Fixes#1458
When the scheduled folder is opened with cached data, sp is null
(the loading spinner is skipped). _loadScheduled receives null and
calls sp.destroy() unconditionally, crashing with TypeError.
Both _getModels() and getAllModels() store the sorted copy in a cache
variable but return the original unsorted array on first invocation.
Subsequent calls return the cache (sorted), causing inconsistent
model picker ordering on first render.
A CPU-only llama.cpp serve config still emitted --flash-attn on and exported
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 (independent toggles, often left on by an Auto
profile), so the command mixed "zero GPU layers" with CUDA/flash-attn and failed
to start (issue #1291). Gate both on a _cpuOnly check (ngl == 0). GPU serving is
unchanged — the gate only affects the ngl=0 path.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
deleteMessage() bailed at `if (!sessionId) return;`, so the "x" on an output
shown before a model/API was selected did nothing — there's no session yet
(issue #1428). The session id is only needed for the server-side delete; without
one (or with no persisted message ids) we now fall through to removing the DOM,
so the "x" always at least dismisses the bubble.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
uploadPending() read `data.files` from /api/upload without checking `res.ok`, so
a non-OK response (429 rate limit, 413 too large, …) was swallowed: the pending
files vanished and the chat sent with no attachments and no feedback — part of
why the model "didn't even see them" in #1346.
Check res.ok; on failure show the server's reason via a toast and keep the
pending files so the attach strip re-renders for a retry (matching the existing
"restored on error" comment that the code never actually honored).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Clicking "New chat" (the brand/welcome navigation path) left the previous
session's unsent draft in the composer (issue #1343). The direct model-picker
path (createDirectChat) already cleared it, but the welcome path did not.
Clear `#message` in chatRenderer.showWelcomeScreen() — the shared entry point
for that state — resetting its autosized height and dispatching an `input` event
so the send button / autosize listeners update. Switching between existing
sessions loads them directly and does not call showWelcomeScreen, so genuine
drafts are not erased.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Cookbook download path showed its error toasts with the default ~1.2s
duration, so an actionable message like "tmux is required for Cookbook
background downloads/serves … install it with your OS package manager" vanished
before it could be read (issue #1355). The serve path already uses multi-second
durations.
Give the three "Download failed" toasts a 9s duration to match.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: Cookbook local GGUF serving inside Docker
Cookbook’s in-container GGUF serve flow had multiple Docker-specific breakages that made local llama.cpp models fail or register against the wrong endpoint.
Fixes included here:
use the scanned model cache root when generating GGUF serve commands instead of hardcoding $HOME/.cache/huggingface/hub
fix malformed llama.cpp preflight build lines that generated invalid bash in serve runner scripts
preserve loopback model URLs inside Docker when the target port is already reachable from the Odysseus container, instead of rewriting them unconditionally to host.docker.internal
Before this change, Docker local serves could fail in several ways:
Cookbook pointed llama.cpp at the wrong GGUF path
generated serve runner scripts crashed before launch with a shell syntax error
successfully started in-container model servers were auto-registered as host.docker.internal: instead of localhost/127.0.0.1
This makes the Docker Cookbook path work as expected for: downloaded GGUF -> local llama.cpp serve -> endpoint registration
* test: add test for docker-local endpoint rewrites
* fix: markdown table renders separator row as visible data
The alignment separator (|---|---|) at row index 1 was rendered as a
<td> row with dashes as cell content. Skip it and only open <tbody>
at that point, so tables render as header + data without the garbage
separator row in between.
* test: add regression test for table separator row rendering
Verifies that the markdown table renderer skips the separator row
(|---|---|) instead of rendering it as a visible data row. Also
updates the test harness to handle the splitTableRow import.