MrSphay/odysseus - odysseus - Gitea: Git with a cup of tea

MrSphay/odysseus

Author	SHA1	Message	Date
spooky	f9e1d38cc2	fix: diagnose vllm serve runtime issues (#1198 )	2026-06-05 11:03:04 +01:00
pewdiepie-archdaemon	9112861d8e	cookbook agent debug loop: persistent log files, auto-adopt orphan tmux, Codex/Claude skill parity Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner: * Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file. * `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400. * `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt). Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-` / `cookbook-` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll. `serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes. Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop. Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles. `adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd. Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).	2026-06-04 23:27:18 +09:00
pewdiepie-archdaemon	562bc4dedc	Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI Backend (services/hwfit + routes): - VRAM column sort now shows global highest first (was special-cased to ascending then truncated top-N, which made "highest VRAM" mathematically unreachable). Every column path uses reverse=True for the truncation. - Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep re-probing the rig during a session; Rescan button still forces fresh. - Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5. - hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the actual 156 GB / 284 GB disk footprints). - New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT / QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES. Frontend — Scan/Download: - Engine + Quant swapped in the toolbar; Quant defaults to "All". - Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag re-sorts by vram ascending (smallest fitting first); back to Max → score. - Ctx slider rail now visible — was background:transparent in a duplicate later-cascade rule. Hardcoded grey + !important. - Search input moved to the far right of the toolbar. - Type/Standard default; "Context" not uppercased; Search placeholder dimmed. - Engine "?" + Quant "?" inline help chips inside their dropdown boxes. - Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc. - Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in tooltip. Smart title-suffix strips the parts already in the repo name (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)"). - Conditional warning for safetensors models on non-GPU rigs only. - Dependency Install / Installed / Installed▾ / N/A all 75.85px wide. - Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag. - Foldable Download admin-card (h2 chevron); line under h2 only when folded. - HF token save gets a green ✓ + "Saved" flash. - Cached scan no longer counts stalled rows as downloaded. - Footer: "Request it →" link with GitHub mark to the public discussion (#1962) for model-add requests. Frontend — Running tab: - Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare "Download complete"). True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate. - ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m". - Clear button kills the tmux session too; if the output still shows a live shard line, the pill is hidden + relabels as "reconnect" + revives on click. - Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled to 8s), scan persisted done/error/crashed downloads and probe their tmux session — if alive, flip status back to running and reattach. - Per-launch zombie probe: clicking Download on a model whose persisted state is done but tmux is still alive revives the existing task and refuses to start a duplicate. - Pre-launch GPU probe: vllm / sglang / diffusers serve check /api/cookbook/gpus first; warns + confirms if no GPU is visible. - Server-side state guard: rejects "done" POSTs for downloads lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stale tabs can't poison persisted state any more. - Running count includes tasks whose output looks active even if persisted status got stuck. Dir text on the running row, font matched to uptime. Serve panel: - Ctx text input always resets to model max on open (default 20000 when metadata is missing). - Max Seqs default 8 -> 4. KV Cache dtype select 32px tall. - Lightning icon on Launch (same as Action toggle). - Diagnosis card simplified (no fold/copy/dismiss), suggestion font matches body; action buttons get icons on the left (Retry/Copy/Edit/ Install/Kill/Switch/etc.). - Incomplete-download serve warning when model status is downloading / stalled / has_incomplete. - MTP "?" tooltip ("supported on a few model families … up to ~3× faster").	2026-06-03 20:25:25 +09:00
pewdiepie-archdaemon	eb79b76432	Cookbook: scoring fixes, UI polish, false-finished + stale-state bug fixes Backend (services/hwfit + routes): - rank_models picks visible set by REQUESTED column, not always score — sorting by Param now shows highest-param models PERIOD (incl. too_tight). - New fit_only param. Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang cannot serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / GPTQ-8bit get a -1.0 quality penalty (was 0.0, tied with FP8), so FP8 wins when both fit. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5 on equal composite score; >=100B integers not misread as versions. - /api/cookbook/hf-latest no longer drops models without an "NB" pattern in the repo id (MiniMax-M2.7, DeepSeek-V4-Pro etc. were silently filtered). - Cached-model scan: atexit flushes models JSON even if the script is killed mid-walk; each scan_dir wrapped in try/except; timeout 60s -> 180s. - KB granularity for sub-MB sizes (was "0 MB" for 12 KB shells). New "stalled" status for shells <1 MB with no .incomplete files. - /api/cookbook/state POST guard: rejects "done" download tasks lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stops stale tabs from poisoning persisted state. - hf_models.json: add zai-org/GLM-5.1; flip zai-org/GLM-5 quantization Q4_K_M -> BF16 (it is the native base, not a quant). Frontend (static/js): - Scan/Download toolbar: quant defaults to All; ctx slider (8k/16k/32k/ 50k/128k/Max) ported from origin/main with sort=fit on drag, sort=score on Max. GPU toggle commits _activeCount to maxGpu on initial render. Fit column header tagged with active budget (RAM / GPU / N GPU). - Foldable Download admin-card: the Download h2 is the chevron trigger; state persists in localStorage. - Download card surfaces destination dir (Dir: <path>). Same dir on running task row, font/color matched to uptime (9px Fira Code muted, opacity .4). - Serve panel ctx text input always resets to model max on open. Sub-MB cached models show with red "download stalled" badge. - Bulk-select Cancel + Delete reset the Select button label on exit. - Cookbook running: false-finished bug fixed — DOWNLOAD_OK or /snapshots/ required; bare "Download complete" no longer marks the task done after the first config file. Clear button now sends tmux kill-session too. True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer per-shard aggregate. - Diagnosis card simplified: removed fold toggle, copy button, dismiss X. Suggestion font matches message body (12px). - HF token field flashes green check + "Saved" on save. - Cached scan no longer counts stalled rows as downloaded in Scan/Download. CSS: - dep Install button width pinned to 76px to match Installed split. - task-sub row +1px; task-status badge gets margin-right 8px. - Ctx slider styled like gallery editor sliders (thin pill rail, red thumb). - Bulk-select cancel button top -3px -> -5px.	2026-06-03 16:32:20 +09:00
pewdiepie-archdaemon	c97375343d	Clarify Cookbook diffusion dependencies	2026-06-01 11:45:26 +09:00
pewdiepie-archdaemon	c953c078e5	Improve Cookbook serve reliability	2026-06-01 11:43:08 +09:00
pewdiepie-archdaemon	e5c99a5eee	Odysseus v1.0	2026-05-31 23:58:26 +09:00