cookbook agent debug loop: persistent log files, auto-adopt orphan tmux, Codex/Claude skill parity

Three converging fixes so the chat agent + external Codex/Claude skills can actually debug a crashed serve instead of staring at a post-crash neofetch banner:

* Serves now `tee` to /tmp/odysseus-tmux/SESSION.log on the host running them. Runner saves fds 3/4 before the tee and restores them right before `exec ${SHELL}`, so the post-crash interactive zsh banner does NOT pollute the log file.
* `tail_serve_output` (chat agent) and `/api/codex/cookbook/output/{sid}` (Codex+Claude skills) both prefer the persistent log file over the tmux pane. Pane is fallback for sessions predating the tee runner. Default tail bumped 150 -> 400.
* `list_served_models` "recent log" snippet seeks to the Traceback line instead of showing the last 6 lines (which was always the bash prompt).

Cookbook auto-adoption sweep on `/api/cookbook/tasks/status`: every 20s (rate-limited) the cookbook SSHes each configured server, finds `serve-*` / `cookbook-*` tmux sessions running an actual model process (vllm/python/llama-server/etc., filtered via `pane_current_command`), and writes them into state.tasks. So when the agent falls back to raw ssh+tmux, the session appears in the Cookbook UI on the next poll.

`serve_model` error path now reads `data["detail"]` in addition to `data["error"]` so the FastAPI HTTPException message ("Invalid characters in cmd") actually reaches the agent instead of being swallowed as a generic "Serve failed". Tool description updated to warn against `cd …`/`source …`/`&&` prefixes.

Intent-without-action supervisor in agent_loop: when the model writes "Let me tail the output" / "I'll check the logs" / "Let me investigate" and ends the turn without emitting a tool call, the loop injects a sharp system nudge ("You said you would X — DO IT NOW") and continues. Capped at 2 nudges per chat so a model that genuinely cannot use the tool does not pin the loop.

Codex/Claude skill parity: adds `/cookbook/cached`, `/cookbook/presets`, `/cookbook/preset/{name}`, `/cookbook/adopt` so external agents have the same surface as the chat agent. SKILL.md docs + odysseus_api.py wrapper updated for both bundles.

`adopt_served_model` promoted to the always-on tool set so the agent has a documented fallback when serve_model rejects a cmd.

Also various cookbook UI tweaks accumulated alongside the above (cookbook.js, cookbookRunning.js, cookbookServe.js, cookbook-diagnosis.js, settings.js, style.css).
This commit is contained in:
pewdiepie-archdaemon
2026-06-04 23:27:18 +09:00
parent 041c03bf11
commit 9112861d8e
19 changed files with 1529 additions and 151 deletions

View File

@@ -378,16 +378,12 @@ export const ERROR_PATTERNS = [
message: 'Model architecture too new for installed vLLM/transformers.',
fixes: [
{ label: 'Try --trust-remote-code', action: (panel) => _serveAutoRetry(panel, '--trust-remote-code'), autofix: true },
{ label: 'Update vLLM on server', action: (panel) => {
const taskEl = panel.closest('.cookbook-task');
const task = taskEl ? _loadTasks().find(t => t.sessionId === taskEl.dataset.taskId) : null;
const host = task?.remoteHost || '';
const prefix = _buildEnvPrefix();
const pipCmd = prefix ? prefix + ' pip install -U vllm transformers' : 'pip install -U vllm transformers';
const cmd = host ? _sshCmd(host, pipCmd) : pipCmd;
// Run in tmux so it doesn't timeout
const name = 'update-vllm';
_launchServeTask(name, 'pip-update', cmd);
{ label: 'Update vLLM on server', action: () => {
// Use the venv's python3 by absolute path when configured (SSH non-
// interactive sessions often pick user-site Python over the venv).
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('update-vllm', 'pip-update', `${_vp} -m pip install -U vllm transformers`);
}},
],
},
@@ -395,16 +391,10 @@ export const ERROR_PATTERNS = [
pattern: /Either a revision or a version must be specified|transformers\.integrations\.hub_kernels|kernels\/layer/i,
message: 'Transformers/kernels package mismatch.',
fixes: [
{ label: 'Repair kernel package', action: (panel) => {
const taskEl = panel.closest('.cookbook-task');
const task = taskEl ? _loadTasks().find(t => t.sessionId === taskEl.dataset.taskId) : null;
const host = task?.remoteHost || '';
const prefix = _buildEnvPrefix();
const pipCmd = prefix
? prefix + ' python3 -m pip install --user --break-system-packages "kernels<0.15"'
: 'python3 -m pip install --user --break-system-packages "kernels<0.15"';
const cmd = host ? _sshCmd(host, pipCmd) : pipCmd;
_launchServeTask('repair-kernels', 'pip-update', cmd);
{ label: 'Repair kernel package', action: () => {
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('repair-kernels', 'pip-update', `${_vp} -m pip install --user --break-system-packages kernels<0.15`);
}},
{ label: 'Open Dependencies', action: () => _openCookbookDependencies('sglang') },
],
@@ -445,14 +435,10 @@ export const ERROR_PATTERNS = [
pattern: /Triton kernels.*Failed to import|cannot import name '\w+' from 'triton_kernels/i,
message: 'Triton kernels version mismatch. Non-fatal warning — model will still run, just without optimized MoE kernels.',
fixes: [
{ label: 'Update triton on server', action: (panel) => {
const taskEl = panel.closest('.cookbook-task');
const task = taskEl ? _loadTasks().find(t => t.sessionId === taskEl.dataset.taskId) : null;
const host = task?.remoteHost || '';
const prefix = _buildEnvPrefix();
const pipCmd = prefix ? prefix + ' pip install -U triton triton-kernels' : 'pip install -U triton triton-kernels';
const cmd = host ? _sshCmd(host, pipCmd) : pipCmd;
_launchServeTask('update-triton', 'pip-update', cmd);
{ label: 'Update triton on server', action: () => {
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('update-triton', 'pip-update', `${_vp} -m pip install -U triton triton-kernels`);
}},
],
},
@@ -474,14 +460,56 @@ export const ERROR_PATTERNS = [
pattern: /attention_sink|sliding.window.*not supported|sliding_window.*incompatible/i,
message: 'Model uses attention features unsupported in this vLLM version.',
fixes: [
{ label: 'Update vLLM on server', action: (panel) => {
const taskEl = panel.closest('.cookbook-task');
const task = taskEl ? _loadTasks().find(t => t.sessionId === taskEl.dataset.taskId) : null;
const host = task?.remoteHost || '';
const prefix = _buildEnvPrefix();
const pipCmd = prefix ? prefix + ' pip install -U vllm' : 'pip install -U vllm';
const cmd = host ? _sshCmd(host, pipCmd) : pipCmd;
_launchServeTask('update-vllm', 'pip-update', cmd);
{ label: 'Update vLLM on server', action: () => {
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('update-vllm', 'pip-update', `${_vp} -m pip install -U vllm`);
}},
],
},
{
// FlashInfer JIT-compiles attention kernels for the host GPU on first
// use. If the system /usr/bin/nvcc is older than CUDA 11.8 it can't
// target sm_89/sm_90 (Ada/Hopper), and the engine workers die before
// they can report a useful traceback. Two quick paths out: pick a
// non-flashinfer attention backend, or set CUDACXX to a newer nvcc
// (vLLM installs nvidia-cuda-nvcc into the venv — point at that).
pattern: /nvcc fatal\s+:\s+Unsupported gpu architecture 'compute_\d+'/i,
message: 'FlashInfer is JIT-compiling sampling kernels with an nvcc too old for this GPU (no sm_89 / sm_90 support — pre-CUDA 11.8). Changing the attention backend does not help — flashinfer JITs the SAMPLER too. The clean fix is to set VLLM_USE_FLASHINFER_SAMPLER=0 so vLLM uses its native sampler instead.',
suggestion: 'Suggested action: relaunch with VLLM_USE_FLASHINFER_SAMPLER=0 prepended. (Confirmed on the QuantTrio/Qwen3.5 model card as the canonical workaround.)',
fixes: [
{ label: 'Retry with VLLM_USE_FLASHINFER_SAMPLER=0', action: (panel) => _serveAutoRetryReplace(panel, '', 'VLLM_USE_FLASHINFER_SAMPLER=0 ', { prepend: true }) },
{ label: 'Uninstall flashinfer-python', action: () => {
// Hard fallback: vLLM 0.22 reaches into flashinfer for sampling kernels
// even with VLLM_USE_FLASHINFER_SAMPLER=0 in some configs. Removing
// the package forces it onto the native sampler.
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('uninstall-flashinfer', 'pip-update', `${_vp} -m pip uninstall flashinfer-python -y`);
}},
{ label: 'Edit serve', action: (panel) => _openServeEditFromDiagnosis(panel) },
],
},
{
// vLLM <-> torch ABI mismatch: vLLM imports torch.library helpers
// (`infer_schema`, `register_fake`, etc.) that only exist on newer torch
// versions. When the installed torch is older, the import fails before
// any server code runs. Fix is to reinstall vllm (which pulls a matching
// torch) or upgrade torch directly.
pattern: /ImportError: cannot import name '[^']+' from 'torch(\.\w+)+'/i,
message: 'vLLM was built against a newer torch than what is installed. Reinstall vLLM so pip pulls a compatible torch (or upgrade torch directly).',
fixes: [
{ label: 'Reinstall vLLM (pulls matching torch)', action: () => {
// Absolute path to the venv's python3 — bare `python3` lands in the
// wrong site-packages over SSH when ~/.local/bin precedes the venv.
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('reinstall-vllm', 'pip-reinstall', `${_vp} -m pip install --force-reinstall vllm`);
}},
{ label: 'Upgrade torch only', action: () => {
const _vp = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3` : 'python3';
_launchServeTask('upgrade-torch', 'pip-update', `${_vp} -m pip install -U torch`);
}},
],
},
@@ -607,59 +635,24 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
};
if (fixes.length) {
// Always render fixes as inline buttons. The old "Actions ▾" dropdown
// (for >3 fixes) was broken — the menu wouldn't open in some panels and
// hid useful actions behind a non-working affordance. Inline buttons wrap
// naturally in `.cookbook-diag-fixes` (flex-wrap) so a long list reflows
// onto multiple rows instead of getting collapsed.
const row = document.createElement('div');
row.className = 'cookbook-diag-fixes';
if (fixes.length <= 3) {
for (const fix of fixes) {
const btn = document.createElement('button');
btn.className = 'cookbook-btn cookbook-diag-btn';
btn.type = 'button';
btn.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
btn.addEventListener('click', (e) => {
e.stopPropagation();
runFix(fix, btn);
});
row.appendChild(btn);
}
body.appendChild(row);
return;
}
const wrap = document.createElement('div');
wrap.className = 'cookbook-diag-actions';
const trigger = document.createElement('button');
trigger.className = 'cookbook-btn cookbook-diag-action-trigger';
trigger.type = 'button';
trigger.textContent = 'Actions';
trigger.appendChild(document.createTextNode(' ▾'));
wrap.appendChild(trigger);
const menu = document.createElement('div');
menu.className = 'dropdown cookbook-diag-menu hidden';
for (const fix of fixes) {
const item = document.createElement('button');
item.type = 'button';
item.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
item.addEventListener('click', async (e) => {
const btn = document.createElement('button');
btn.className = 'cookbook-btn cookbook-diag-btn';
btn.type = 'button';
btn.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
btn.addEventListener('click', (e) => {
e.stopPropagation();
if (item.dataset.busy || trigger.dataset.busy) return;
item.dataset.busy = '1';
await runFix(fix, trigger, fix.label, () => menu.classList.add('hidden'), () => delete item.dataset.busy);
runFix(fix, btn);
});
menu.appendChild(item);
row.appendChild(btn);
}
wrap.appendChild(menu);
trigger.addEventListener('click', (e) => {
e.stopPropagation();
if (trigger.dataset.busy) return;
document.querySelectorAll('.cookbook-diag-menu').forEach(m => {
if (m !== menu) m.classList.add('hidden');
});
menu.classList.toggle('hidden');
});
row.appendChild(wrap);
body.appendChild(row);
}
}

View File

@@ -353,6 +353,15 @@ function _buildEnvPrefixWindows() {
}
export function _buildServeCmd(f, modelName, backend) {
// When a venv is configured on the chosen server, use the venv's binaries
// by absolute path. Bare `vllm` / `python3` relies on PATH, and SSH non-
// interactive sessions often leave a user-site install (~/.local/bin/vllm)
// ahead of the venv's bin, so the WRONG vllm gets launched even with the
// venv activated. Absolute path sidesteps the whole PATH question.
const _isVenv = _envState.env === 'venv' && _envState.envPath;
const _venvBin = _isVenv ? (_envState.envPath.replace(/\/+$/, '') + '/bin/') : '';
const _vllmBin = _venvBin ? `${_venvBin}vllm` : 'vllm';
const _py3Bin = _venvBin ? `${_venvBin}python3` : 'python3';
let cmd = '';
if (backend === 'vllm') {
const gpuId = f.gpu_id?.trim() || '';
@@ -361,7 +370,15 @@ export function _buildServeCmd(f, modelName, backend) {
const _opts = _detectModelOptimizations(modelName);
if (_opts.envVars.length) cmd += _opts.envVars.join(' ') + ' ';
}
cmd += `vllm serve ${modelName} --host 0.0.0.0 --port ${f.port || '8000'}`;
// Pinned attention backend (Attention field). Empty = let vLLM pick.
const _attn = (f.vllm_attn_backend ?? '').toString().trim();
if (_attn) cmd += `VLLM_ATTENTION_BACKEND=${_attn} `;
// Free-text "Env" field — verbatim KEY=VAL pairs (space-separated).
// Collapse any pasted newlines/tabs so the backend allowlist (which
// rejects \n / \r) doesn't trip on a multi-line paste from a model card.
const _extraEnv = (f.extra_env ?? '').toString().replace(/\s+/g, ' ').trim();
if (_extraEnv) cmd += _extraEnv + ' ';
cmd += `${_vllmBin} serve ${modelName} --host 0.0.0.0 --port ${f.port || '8000'}`;
cmd += ` --tensor-parallel-size ${f.tp || '1'}`;
cmd += ` --max-model-len ${f.ctx || '8192'}`;
cmd += ` --gpu-memory-utilization ${f.gpu_mem || '0.90'}`;
@@ -389,7 +406,9 @@ export function _buildServeCmd(f, modelName, backend) {
} else if (backend === 'sglang') {
const gpuId = f.gpu_id?.trim() || '';
if (gpuId) cmd += `CUDA_VISIBLE_DEVICES=${gpuId} `;
cmd += `python3 -m sglang.launch_server --model-path ${modelName} --host 0.0.0.0 --port ${f.port || '30000'}`;
const _extraEnv = (f.extra_env ?? '').toString().replace(/\s+/g, ' ').trim();
if (_extraEnv) cmd += _extraEnv + ' ';
cmd += `${_py3Bin} -m sglang.launch_server --model-path ${modelName} --host 0.0.0.0 --port ${f.port || '30000'}`;
if (f.tp && f.tp !== '1') cmd += ` --tp ${f.tp}`;
if (f.ctx) cmd += ` --context-length ${f.ctx}`;
if (f.gpu_mem && f.gpu_mem !== '0.90') cmd += ` --mem-fraction-static ${f.gpu_mem}`;
@@ -642,13 +661,20 @@ async function _fetchDependencies() {
const winBlocked = !isLocal && _isWindows() && _winUnsupported.has(pkg.name);
const note = pkg.status_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.65;margin-top:3px;">${esc(pkg.status_note)}</div>` : '';
const updateNote = pkg.installed && pkg.pip_update_available === false && pkg.update_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.55;margin-top:3px;">${esc(pkg.update_note)}</div>` : '';
// Inline "Rebuild" tag for the llama_cpp row only. Styled as a
// .cookbook-dep-tag so it matches the LLM category tag's pill look,
// and lives to the LEFT of the category tag (clear affordance before
// the row "value").
const _rebuildBtn = (pkg.name === 'llama_cpp')
? `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build).">Rebuild</button>`
: '';
// Inline rebuild/reinstall tag. Styled as a .cookbook-dep-tag so it
// matches the LLM category tag's pill look, and lives to the LEFT of the
// category tag. llama_cpp uses the /api/cookbook/rebuild-engine flow
// (clear cached binary so next serve recompiles); vllm/sglang use the
// diagnosis-style `_launchServeTask` with `pip install --force-reinstall`
// so the user can watch the pip install in the Running tab.
let _rebuildBtn = '';
if (pkg.name === 'llama_cpp') {
_rebuildBtn = `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build).">Rebuild</button>`;
} else if (pkg.name === 'vllm' && pkg.installed) {
_rebuildBtn = `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild cookbook-dep-reinstall" data-reinstall-pkg="vllm" title="Force-reinstall vLLM (pulls a matching torch). Runs as a tmux task in the Running tab.">Reinstall</button>`;
} else if (pkg.name === 'sglang' && pkg.installed) {
_rebuildBtn = `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild cookbook-dep-reinstall" data-reinstall-pkg="sglang" title="Force-reinstall SGLang (pulls a matching torch). Runs as a tmux task in the Running tab.">Reinstall</button>`;
}
return `<div class="cookbook-dep-row${winBlocked ? ' cookbook-dep-blocked' : ''}" data-pkg-name="${esc(pkg.name)}" data-dep-pip="${esc(pkg.pip || '')}" data-dep-target="${isLocal ? 'local' : 'remote'}" data-dep-kind="${esc(pkg.kind || 'python')}">`
+ `<div class="cookbook-dep-info">`
+ `<div class="memory-item-title">${esc(pkg.name)}</div>`
@@ -696,7 +722,18 @@ async function _fetchDependencies() {
// for PEP-668-locked system pythons (Arch, newer Debian).
const _inEnv = _envState.env === 'venv' || _envState.env === 'conda';
const _pipFlags = (!_isWindows() && !_inEnv) ? ' --user --break-system-packages' : '';
const _py = _isWindows() ? 'python' : 'python3';
// Use the venv's python3 by absolute path when configured. Even with the
// env_prefix sourcing activate, SSH non-interactive sessions sometimes
// pick a `python3` ahead of the venv's bin on PATH, so the install
// silently lands in the wrong site-packages.
let _py;
if (_isWindows()) {
_py = 'python';
} else if (_envState.env === 'venv' && _envState.envPath) {
_py = `${_envState.envPath.replace(/\/+$/, '')}/bin/python3`;
} else {
_py = 'python3';
}
const cmd = `${_py} -m pip install${upgrade ? ' -U' : ''}${_pipFlags} "${pipName}"`;
let envPrefix = '';
if (_isWindows()) {
@@ -1072,6 +1109,32 @@ function _wireTabEvents(body) {
});
}
// "Reinstall" buttons for pip-based serving stacks (vllm, sglang). The
// deps list renders ASYNCHRONOUSLY after _fetchDependencies resolves, so
// attaching listeners directly here would miss buttons that don't exist
// yet. Use document-level delegation instead — the click always finds the
// right .cookbook-dep-reinstall button no matter when it was painted.
if (!document._cookbookReinstallWired) {
document._cookbookReinstallWired = true;
document.addEventListener('click', async (ev) => {
const btn = ev.target.closest?.('.cookbook-dep-reinstall');
if (!btn) return;
const pkg = btn.dataset.reinstallPkg || '';
if (!pkg) return;
ev.preventDefault();
ev.stopPropagation();
const sel = document.getElementById('hwfit-deps-server');
if (sel) _applyServerSelection(sel.value);
const host = _envState.remoteHost || '';
const where = host || 'this server';
if (!confirm(`Reinstall ${pkg} on ${where}?\n\nRuns "pip install --force-reinstall --no-deps ${pkg}" as a tmux task. Watch progress in the Running tab.`)) return;
const _venvPy = (_envState.env === 'venv' && _envState.envPath)
? `${_envState.envPath.replace(/\/+$/, '')}/bin/python3`
: 'python3';
_launchServeTask(`reinstall-${pkg}`, 'pip-reinstall', `${_venvPy} -m pip install --force-reinstall --no-deps ${pkg}`);
}, true);
}
// Serve sort
const serveSort = document.getElementById('serve-sort');
if (serveSort) {

View File

@@ -124,6 +124,14 @@ async function _openDownloadForGgufTask(task) {
function _terminalServeDiagnosis(task, outputText) {
const out = String(outputText || task?.output || '');
if (!task || task.type !== 'serve' || !['stopped', 'error', 'crashed', 'failed'].includes(task.status) || !out.trim()) return null;
// Pip tasks (Reinstall vLLM, Upgrade torch, etc.) ride on the serve task
// type so they get a tmux session + show up in Running tab — but they are
// NOT serve invocations. Their output is pip's own; the generic
// "Serve stopped before the model became reachable" message + Edit-serve
// fix make no sense. Bail so the panel just shows pip's output.
const _isPipTask = ((task.payload?.repo_id || '').startsWith('pip-'))
|| /python3? -m pip\b/.test(task.payload?._cmd || '');
if (_isPipTask) return null;
if (_serveTaskLooksAwqOnLocalBackend(task, out)) {
return {
message: 'AWQ/GPTQ/FP8 cannot be served through llama.cpp/Ollama unified-memory mode.',
@@ -249,7 +257,7 @@ const SERVE_STATE_KEY = 'cookbook-serve-state';
// Polling / timeout intervals
const TASK_POLL_INTERVAL_MS = 3000; // delay between reconnect-loop iterations
const BG_MONITOR_INTERVAL_MS = 10000; // background task status poll
const BG_MONITOR_INTERVAL_MS = 5000; // background task status poll
const STALE_PROGRESS_MS = 5 * 60 * 1000; // download with no progress this long = stale
const STARTUP_STALE_PROGRESS_MS = 45 * 1000; // 0%-forever startup stall: retry much sooner
@@ -523,6 +531,26 @@ function _serveOutputLooksReady(task) {
function _normalizeTaskForDisplay(task) {
if (!task || typeof task !== 'object') return task;
// Pip tasks (Reinstall vLLM / Upgrade torch / etc.) ride on the serve task
// type so they get tmux + the Running tab. They are NOT serves — their
// "ready" markers are pip's `Successfully installed` / `Requirement already
// satisfied`, not "Application startup complete".
const _isPipTask = ((task.payload?.repo_id || '').startsWith('pip-'))
|| /python3? -m pip\b/.test(task.payload?._cmd || '');
if (_isPipTask) {
// Override stale status: any pip task whose output carries pip's own
// success markers gets displayed as `done` regardless of what's in
// localStorage. Old pre-fix runs landed in error/stopped state and
// stuck there even after we taught the rest of the flow about pip
// tasks — this is the catch-all that flips them to Finished on render.
const out = String(task.output || '');
const ranOk = /Successfully installed|Requirement already (?:satisfied|up-to-date)/i.test(out)
&& !/error:|ERROR:/.test(out.slice(-1024));
if (ranOk && task.status !== 'done' && task.status !== 'running') {
return { ...task, status: 'done' };
}
return task;
}
if (task.type === 'serve' && task.status === 'done' && !_serveOutputLooksReady(task)) {
return { ...task, status: 'error' };
}
@@ -2409,7 +2437,7 @@ async function _reconnectTask(el, task) {
if (data.exit_code !== 0) {
failCount++;
if (failCount < 5) {
await new Promise(r => setTimeout(r, 5000));
await new Promise(r => setTimeout(r, 3000));
continue;
}
try {
@@ -2430,7 +2458,15 @@ async function _reconnectTask(el, task) {
}
const lastOutput = output.textContent || '';
const diag = _diagnose(lastOutput);
// Pip tasks (Reinstall vLLM / Upgrade torch / etc.) must skip the
// generic serve `_diagnose` step. Their output is pip's own and the
// error patterns there (torch ABI traceback, "No module named torch",
// etc.) are routinely matched against the previous tmux scrollback,
// tagging a clean pip success as a crashed serve. Detection is the
// same shape as the looksSuccessful branch below.
const _isPipTaskDiag = ((task.payload?.repo_id || '').startsWith('pip-'))
|| /python3? -m pip\b/.test(task.payload?._cmd || '');
const diag = _isPipTaskDiag ? null : _diagnose(lastOutput);
if (diag) {
let diagEl = el.querySelector('.cookbook-diagnosis');
if (!diagEl) {
@@ -2447,14 +2483,40 @@ async function _reconnectTask(el, task) {
} else {
const downloadLooksSuccessful = !lastOutput.includes('DOWNLOAD_FAILED')
&& (lastOutput.includes('DONE') || lastOutput.includes('100%') || lastOutput.includes('/snapshots/') || lastOutput.includes('Download complete') || lastOutput.includes('DOWNLOAD_OK'));
// Pip install / reinstall tasks are launched via _launchServeTask (so
// they show up in the Running tab + use tmux) but they aren't real
// serves — the cmd is `python3 -m pip ...` and the success markers
// are pip's own. Without this branch, a successful reinstall ends
// with no "Uvicorn running on" line and gets mis-flagged as a crashed
// serve.
const _isPipTask = ((task.payload?.repo_id || '').startsWith('pip-'))
|| /python3? -m pip\b/.test(task.payload?._cmd || '');
const pipLooksSuccessful = _isPipTask
&& /Successfully installed|Requirement already (?:satisfied|up-to-date)/i.test(lastOutput)
&& !/error:|ERROR:/.test(lastOutput.slice(-1024));
const serveLooksReady = task.type === 'serve' && _serveOutputLooksReady({ ...task, output: lastOutput });
const looksSuccessful = task.type === 'download' ? downloadLooksSuccessful : serveLooksReady;
const looksSuccessful = task.type === 'download'
? downloadLooksSuccessful
: (_isPipTask ? pipLooksSuccessful : serveLooksReady);
if (!lastOutput.trim() || !looksSuccessful) {
_updateTask(task.sessionId, { status: 'crashed' });
el.dataset.status = 'crashed';
const badge = el.querySelector('.cookbook-task-status');
if (badge) { badge.textContent = _statusLabel('crashed', task.type); badge.className = 'cookbook-task-status cookbook-task-crashed'; }
if (task.type === 'serve') {
if (_isPipTask) {
// Pip tasks: don't run the serve diagnosis (which would yell
// "Serve stopped before the model became reachable"). Show a
// pip-tailored message; the user can read pip's own error output
// directly above.
const _ranOk = /Successfully installed|Requirement already (?:satisfied|up-to-date)/i.test(lastOutput);
if (!_ranOk) {
_showDiagnosis(el, {
message: 'Pip install did not finish with a success marker. Check the output for the underlying error.',
suggestion: 'Suggested action: copy the troubleshooting bundle. Common causes: missing build deps, network blip, mismatched torch ABI.',
fixes: [],
}, lastOutput);
}
} else if (task.type === 'serve') {
const diag = _diagnose(lastOutput) || {
message: _serveTaskLooksAwqOnLocalBackend(task, lastOutput)
? 'AWQ/GPTQ/FP8 cannot be served through llama.cpp/Ollama unified-memory mode.'
@@ -2533,6 +2595,28 @@ async function _reconnectTask(el, task) {
}
_showCookbookNotif(true);
} else {
// Strong completion markers — `DOWNLOAD_OK` is emitted by our
// downloader wrapper AFTER the model snapshot is on disk, and
// `/snapshots/` only appears once HF has resolved the cached
// tree. Either is conclusive. Finalize as done immediately, skip
// the 30s debounce — the debounce only exists to guard against
// ambiguous markers (bare "100%" / "Download complete") which can
// appear mid-stream during multi-file downloads.
const _strongDone = task.type === 'download'
&& (lastOutput.includes('DOWNLOAD_OK') || lastOutput.includes('/snapshots/'));
if (_strongDone) {
_updateTask(task.sessionId, { status: 'done', _doneConfirmAt: null, _lastStatusFlipAt: Date.now() });
el.dataset.status = 'done';
const badge = el.querySelector('.cookbook-task-status');
if (badge) { badge.textContent = _statusLabel('done', task.type); badge.className = 'cookbook-task-status cookbook-task-done'; }
const _chk = el.querySelector('.cookbook-task-check'); if (_chk) _chk.style.display = '';
const _sb = el.querySelector('.cookbook-task-serve-btn'); if (_sb) _sb.style.display = '';
_showCookbookNotif();
_refreshDepsAfterInstall(task);
_renderRunningTab();
_processQueue();
break;
}
// Debounce the done flip. Tmux capture-pane can fail transiently
// (network blip, ssh reconnect), and the verify has-session right
// above can briefly report dead even when the session is in the
@@ -2559,7 +2643,7 @@ async function _reconnectTask(el, task) {
stillAlive = pData.exit_code === 0;
} catch { /* network blip — treat as inconclusive, prefer running */ stillAlive = true; }
if (stillAlive) {
_updateTask(task.sessionId, { status: 'running', _doneConfirmAt: null });
_updateTask(task.sessionId, { status: 'running', _doneConfirmAt: null, _lastStatusFlipAt: Date.now() });
const _el = document.querySelector(`.cookbook-task[data-task-id="${task.sessionId}"]`);
if (_el) {
_el.dataset.status = 'running';
@@ -2571,7 +2655,7 @@ async function _reconnectTask(el, task) {
}
return;
}
_updateTask(task.sessionId, { status: 'done', _doneConfirmAt: null });
_updateTask(task.sessionId, { status: 'done', _doneConfirmAt: null, _lastStatusFlipAt: Date.now() });
const _el = document.querySelector(`.cookbook-task[data-task-id="${task.sessionId}"]`);
if (_el) {
_el.dataset.status = 'done';
@@ -2596,8 +2680,14 @@ async function _reconnectTask(el, task) {
const snapshot = (data.stdout || '').trim();
if (snapshot) {
// Only auto-scroll to bottom if the user was already there. When
// they've scrolled up to read earlier output, leave their position
// alone so a fresh snapshot doesn't yank them back to the tail.
// 40px tolerance covers sub-pixel rounding + the moment between
// releasing the scrollbar and the next poll arriving.
const _atBottom = (output.scrollHeight - output.scrollTop - output.clientHeight) < 40;
output.textContent = snapshot;
output.scrollTop = output.scrollHeight;
if (_atBottom) output.scrollTop = output.scrollHeight;
// Live status parsing for download tasks
if (task.type === 'download') {
@@ -3153,16 +3243,27 @@ export async function _selfHealStaleTasks(opts = {}) {
// itself fires every 10s, so this almost always fires too, but the
// guard keeps a fast manual call from doubling up).
const now = Date.now();
if (now - _selfHealLastTs < 8000) return;
if (now - _selfHealLastTs < 4000) return;
_selfHealLastTs = now;
}
const tasks = _loadTasks();
const candidates = tasks.filter(t =>
t.type === 'download'
&& ['done', 'error', 'crashed', 'stopped'].includes(t.status)
&& t.sessionId
&& !String(t.sessionId).startsWith('queue-')
);
const candidates = tasks.filter(t => {
if (t.type !== 'download') return false;
if (!['done', 'error', 'crashed', 'stopped'].includes(t.status)) return false;
if (!t.sessionId || String(t.sessionId).startsWith('queue-')) return false;
// Finished downloads with strong completion markers (DOWNLOAD_OK or HF
// /snapshots/ resolution) are demonstrably done — do not flip them back
// to running just because the tmux session is still alive (e.g., a
// long-lived shell that hosted the download or a flapping SSH that
// reports the session as up). This was the main source of finished↔
// downloading oscillation on a flaky connection.
if (t.status === 'done' && /DOWNLOAD_OK|\/snapshots\//.test(t.output || '')) return false;
// Cooldown: never flip the same task more than once every 45s. A flapping
// SSH connection used to drive the badge back-and-forth on every probe
// cycle; this enforces a stable view between flaps.
if (t._lastStatusFlipAt && (Date.now() - t._lastStatusFlipAt < 45000)) return false;
return true;
});
if (!candidates.length) return;
let flipped = 0;
for (const t of candidates) {
@@ -3180,6 +3281,7 @@ export async function _selfHealStaleTasks(opts = {}) {
if (ft && ft.status !== 'running') {
ft.status = 'running';
ft._selfHealed = true;
ft._lastStatusFlipAt = Date.now();
_saveTasks(fresh);
flipped++;
const _el = document.querySelector(`.cookbook-task[data-task-id="${t.sessionId}"]`);

View File

@@ -613,6 +613,20 @@ function _rerenderCachedModels() {
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 4 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '4'))}" placeholder="4" /></label>`;
panelHtml += `<label>${_l('Dtype','Data type for weights. auto picks best for GPU')}<select class="hwfit-sf" data-field="dtype">${dtypeOpts}</select></label>`;
panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype" style="height:32px;">${vllmKvCacheOpts}</select></label>`;
// Attention backend selector — pin the kernel impl. Default `auto` lets
// vLLM pick FlashInfer (which JITs on first use and breaks on older
// system nvcc) → FlashAttention → xformers. Forcing FLASH_ATTN skips
// the JIT entirely, fixing the `nvcc fatal: Unsupported gpu
// architecture 'compute_89'` failure mode on Ada / Hopper hosts.
const vllmAttnBackendOpts = ['auto', 'FLASH_ATTN', 'XFORMERS', 'FLASHINFER', 'TORCH_SDPA']
.map(b => `<option value="${b === 'auto' ? '' : b}"${(sv('vllm_attn_backend','') === (b === 'auto' ? '' : b)) ? ' selected' : ''}>${b}</option>`).join('');
panelHtml += `<label class="hwfit-backend-vllm">${_l('Attention','vLLM VLLM_ATTENTION_BACKEND. auto = vLLM picks (often FLASHINFER, which JITs and can fail on old nvcc). FLASH_ATTN skips the JIT entirely.')}<select class="hwfit-sf" data-field="vllm_attn_backend" style="height:32px;">${vllmAttnBackendOpts}</select></label>`;
// Free-text env-vars field. Anything pasted here is prepended to the
// launch command verbatim. Use for CUDACXX, PATH overrides, NCCL_*
// tuning, or any other KEY=VALUE pair that doesn't have a dedicated
// field. After the venv activate runs, $VIRTUAL_ENV / $PATH / etc. are
// already exported so they expand correctly here.
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang" style="flex:1 1 100%;">${_l('Env','Extra KEY=VALUE env-var pairs prepended to the launch (space-separated). Example: CUDACXX=$VIRTUAL_ENV/lib/python3.10/site-packages/nvidia/cuda_nvcc/bin/nvcc — points flashinfer at the venv-bundled nvcc when the system one is too old for your GPU.')}<input type="text" class="hwfit-sf" data-field="extra_env" value="${esc(sv('extra_env',''))}" placeholder="CUDACXX=/path/to/nvcc NCCL_P2P_DISABLE=1" style="width:100%;" /></label>`;
panelHtml += `</div>`;
// Row 2b: Diffusers settings
const diffDtypeOpts = ['bfloat16','float16','float32'].map(d => `<option value="${d}"${sv('diff_dtype','bfloat16')===d?' selected':''}>${d}</option>`).join('');
@@ -1643,6 +1657,35 @@ function _rerenderCachedModels() {
// Launch button
panel.querySelector('.hwfit-serve-launch').addEventListener('click', async (ev) => {
const _launchBtn = ev.currentTarget;
// Immediate visual feedback. The GPU probe + backend-warning prompt
// below can take ~1-2s before the task UI shows up, leaving the
// button looking dead. Drop in the same whirlpool spinner the rest of
// the cookbook uses (Probe GPUs, dependency installs, etc.) right
// away; restored on any early-return / failure path below.
const _origBtnHtml = _launchBtn.innerHTML;
const _origBtnDisabled = _launchBtn.disabled;
let _launchingWp = null;
const _restoreLaunchBtn = () => {
try { _launchingWp?.destroy?.(); } catch {}
_launchingWp = null;
_launchBtn.innerHTML = _origBtnHtml;
_launchBtn.disabled = _origBtnDisabled;
};
_launchBtn.disabled = true;
_launchBtn.innerHTML = '';
const _launchingWrap = document.createElement('span');
_launchingWrap.className = 'hwfit-serve-launching';
_launchingWrap.style.cssText = 'display:inline-flex;align-items:center;gap:6px;';
_launchingWp = spinnerModule.createWhirlpool(18);
if (_launchingWp?.element) {
_launchingWp.element.style.margin = '0';
_launchingWp.element.style.transform = 'translateY(-2px)';
_launchingWrap.appendChild(_launchingWp.element);
}
const _launchingLabel = document.createElement('span');
_launchingLabel.textContent = 'Launching…';
_launchingWrap.appendChild(_launchingLabel);
_launchBtn.appendChild(_launchingWrap);
// Final safety net: never launch with ctx beyond the model's trained
// limit (or the absolute sanity ceiling when the limit is unknown). A
// stale preset or typo (e.g. 16000000) overflows and, with a quantized
@@ -1650,7 +1693,14 @@ function _rerenderCachedModels() {
// command (then we respect their literal text).
if (!_cmdManuallyEdited) _clampCtx(true);
if (!_cmdManuallyEdited) updateCmd();
const launchCmd = _cmdTextarea ? _cmdTextarea.value.trim() : panel._cmd;
// Pasted commands often carry hidden newlines / CRs / tabs from copies
// out of model cards or wrapped help text. The backend cmd allowlist
// rejects \n / \r outright (`Invalid characters in cmd`), so collapse
// all whitespace to single spaces before launch — same effect as the
// user manually re-flowing the textarea, no behavior change.
const _rawLaunchCmd = _cmdTextarea ? _cmdTextarea.value : panel._cmd;
const launchCmd = String(_rawLaunchCmd || '').replace(/\s+/g, ' ').trim();
if (_cmdTextarea && _cmdTextarea.value !== launchCmd) _cmdTextarea.value = launchCmd;
const serveState = {};
panel.querySelectorAll('.hwfit-sf').forEach(el => {
if (el.type === 'checkbox') serveState[el.dataset.field] = el.checked;
@@ -1659,6 +1709,7 @@ function _rerenderCachedModels() {
serveState.backend = serveState.backend || (_detectBackend(m).backend) || 'vllm';
const backendWarning = _serveBackendWarning(m, repo, serveState.backend, serveState);
if (backendWarning) {
_restoreLaunchBtn();
await window.styledConfirm(backendWarning.body, {
title: backendWarning.title,
confirmText: 'Edit settings',
@@ -1689,7 +1740,7 @@ function _rerenderCachedModels() {
`No GPU detected on ${_probeHost ? _probeHost : 'this host'}. ${serveState.backend.toUpperCase()} needs a visible CUDA/ROCm accelerator to start — launching now will most likely crash early.\n\nLaunch anyway?`,
{ title: 'No GPU detected', confirmText: 'Launch anyway', cancelText: 'Cancel', danger: true },
);
if (!_proceed) return;
if (!_proceed) { _restoreLaunchBtn(); return; }
}
} catch {
// Network / probe failure — don't block. Better to let the launch

View File

@@ -4566,6 +4566,8 @@ async function initUnifiedIntegrations() {
{ key: 'calendar:write', label: 'Calendar write', detail: 'Create and update calendar events' },
{ key: 'memory:read', label: 'Memory', detail: 'Read memory when enabled' },
{ key: 'memory:write', label: 'Memory write', detail: 'Write memory when enabled' },
{ key: 'cookbook:read', label: 'Cookbook', detail: 'List cookbook tasks + tail their tmux output (debug a model serve from outside the UI)' },
{ key: 'cookbook:launch', label: 'Cookbook launch', detail: 'Launch and stop cookbook serve tasks. Powerful: runs SSH commands on your configured servers, bounded by the same allowlist the UI uses (vllm/python3/sglang/llama-server/...)' },
];
// Strict name-prefix match keeps Codex and Claude tokens in their own forms.
const agentTokens = (Array.isArray(tokens) ? tokens : []).filter(tok =>
@@ -4578,6 +4580,7 @@ async function initUnifiedIntegrations() {
email: '<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="2" y="4" width="20" height="16" rx="2"/><polyline points="2 6 12 13 22 6"/></svg>',
calendar: '<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="3" y="4" width="18" height="18" rx="2" ry="2"/><line x1="16" y1="2" x2="16" y2="6"/><line x1="8" y1="2" x2="8" y2="6"/><line x1="3" y1="10" x2="21" y2="10"/></svg>',
memory: '<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="1.7" stroke-linecap="round" stroke-linejoin="round"><path d="M9.5 2a2.5 2.5 0 0 0-2.5 2.5 2.5 2.5 0 0 0-2.5 2.5A2.5 2.5 0 0 0 2 9.5v3A2.5 2.5 0 0 0 4.5 15a2.5 2.5 0 0 0 2.5 2.5A2.5 2.5 0 0 0 9.5 20H10V2z"/><path d="M14.5 2a2.5 2.5 0 0 1 2.5 2.5 2.5 2.5 0 0 1 2.5 2.5A2.5 2.5 0 0 1 22 9.5v3A2.5 2.5 0 0 1 19.5 15a2.5 2.5 0 0 1-2.5 2.5A2.5 2.5 0 0 1 14.5 20H14V2z"/></svg>',
cookbook: '<svg width="14" height="14" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M4 19.5A2.5 2.5 0 0 1 6.5 17H20"/><path d="M6.5 2H20v20H6.5A2.5 2.5 0 0 1 4 19.5v-15A2.5 2.5 0 0 1 6.5 2z"/></svg>',
};
const _scopeNiceLabel = (label) => label.replace(/\s+(write|drafts?|send)$/i, '');
const _scopeAction = (key) => (key.split(':')[1] || '').toLowerCase();

View File

@@ -19281,6 +19281,11 @@ body.gallery-selecting .gallery-dl-btn,
background: color-mix(in srgb, var(--fg) 7%, transparent);
font-size: 12px;
border-bottom: 1px solid color-mix(in srgb, var(--fg) 6%, transparent);
/* Pin the row so flex parents + Firefox mobile can't squeeze its height to 0,
which hides the type pill + model name and leaves only the sub-line +
output visible. */
flex-shrink: 0;
min-height: 32px;
}
.cookbook-task-type {
text-transform: uppercase;