Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
This commit is contained in:
@@ -23,6 +23,44 @@ import {
|
||||
// browser loads it once. See cookbook-hwfit.js.
|
||||
} from './cookbook.js';
|
||||
import uiModule from './ui.js';
|
||||
|
||||
// Tiny HTML-escape — keeps the file standalone instead of leaning on a
|
||||
// shared helper that may not be exported from this module's import surface.
|
||||
function _diagEsc(s) {
|
||||
return String(s ?? '').replace(/[&<>"']/g, c => ({'&':'&','<':'<','>':'>','"':'"',"'":'''}[c]));
|
||||
}
|
||||
|
||||
// Pick an icon for a diagnosis-action button based on the label. The icon
|
||||
// renders on the LEFT of the button text. Keeps the strokes consistent
|
||||
// across the set so they read as one family.
|
||||
function _diagFixIcon(label) {
|
||||
const l = String(label || '').toLowerCase();
|
||||
const _svg = (path) => `<svg width="11" height="11" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.2" stroke-linecap="round" stroke-linejoin="round" class="cookbook-diag-btn-ico" aria-hidden="true">${path}</svg>`;
|
||||
if (l.startsWith('retry') || l.includes('relaunch') || l.includes('restart')) {
|
||||
// Circular-arrow refresh
|
||||
return _svg('<polyline points="23 4 23 10 17 10"/><polyline points="1 20 1 14 7 14"/><path d="M3.51 9a9 9 0 0 1 14.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0 0 20.49 15"/>');
|
||||
}
|
||||
if (l.startsWith('copy')) {
|
||||
return _svg('<rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/>');
|
||||
}
|
||||
if (l.startsWith('edit')) {
|
||||
return _svg('<path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4Z"/>');
|
||||
}
|
||||
if (l.startsWith('open') || l.includes('dependencies')) {
|
||||
return _svg('<path d="M14 3h7v7"/><path d="M21 3l-9 9"/><path d="M21 14v5a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V5a2 2 0 0 1 2-2h5"/>');
|
||||
}
|
||||
if (l.startsWith('install') || l.includes('upgrade')) {
|
||||
return _svg('<path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/><polyline points="7 10 12 15 17 10"/><line x1="12" y1="15" x2="12" y2="3"/>');
|
||||
}
|
||||
if (l.startsWith('kill') || l.startsWith('stop')) {
|
||||
return _svg('<rect x="6" y="6" width="12" height="12" rx="1"/>');
|
||||
}
|
||||
if (l.startsWith('switch') || l.includes('use ')) {
|
||||
return _svg('<polyline points="17 1 21 5 17 9"/><path d="M3 11V9a4 4 0 0 1 4-4h14"/><polyline points="7 23 3 19 7 15"/><path d="M21 13v2a4 4 0 0 1-4 4H3"/>');
|
||||
}
|
||||
// Default: lightbulb (generic "suggestion")
|
||||
return _svg('<path d="M9 21h6"/><path d="M12 17v4"/><path d="M12 3a6 6 0 0 0-4 10.5c1 1 1.5 2 1.5 3.5h5c0-1.5.5-2.5 1.5-3.5A6 6 0 0 0 12 3Z"/>');
|
||||
}
|
||||
import spinnerModule from './spinner.js';
|
||||
|
||||
// ── Error diagnosis ──
|
||||
@@ -577,7 +615,7 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
|
||||
const btn = document.createElement('button');
|
||||
btn.className = 'cookbook-btn cookbook-diag-btn';
|
||||
btn.type = 'button';
|
||||
btn.textContent = fix.label;
|
||||
btn.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
|
||||
btn.addEventListener('click', (e) => {
|
||||
e.stopPropagation();
|
||||
runFix(fix, btn);
|
||||
@@ -603,7 +641,7 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
|
||||
for (const fix of fixes) {
|
||||
const item = document.createElement('button');
|
||||
item.type = 'button';
|
||||
item.textContent = fix.label;
|
||||
item.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
|
||||
item.addEventListener('click', async (e) => {
|
||||
e.stopPropagation();
|
||||
if (item.dataset.busy || trigger.dataset.busy) return;
|
||||
|
||||
@@ -527,6 +527,9 @@ export async function _hwfitFetch(fresh = false) {
|
||||
if (useCase) params.set('use_case', useCase);
|
||||
if (quantPref) params.set('quant', quantPref);
|
||||
if (targetCtx) params.set('ctx', String(targetCtx));
|
||||
// Fit-only filter — set by the dot in the Fit column header.
|
||||
const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
|
||||
if (_fitOnly) params.set('fit_only', '1');
|
||||
}
|
||||
const endpoint = isImageMode ? `/api/hwfit/image-models?${params}` : `/api/hwfit/models?${params}`;
|
||||
const res = await fetch(endpoint);
|
||||
@@ -888,9 +891,15 @@ export function _hwfitRenderList(el, models) {
|
||||
arrow = isReversed ? ' \u25B2' : ' \u25BC';
|
||||
}
|
||||
const dataAttr = col.key ? ` data-sort="${col.key}"` : '';
|
||||
const label = (col.cls === 'hwfit-fit' && _budget)
|
||||
? `${col.label} <span style="font-size:0.75em;opacity:0.6;font-weight:normal;">(${_budget})</span>`
|
||||
: col.label;
|
||||
// Fit column gets a small dot to its left that toggles "show only models
|
||||
// that fit" — replaces the old Fits On/Off button next to the toolbar.
|
||||
let label = col.label;
|
||||
if (col.cls === 'hwfit-fit') {
|
||||
const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
|
||||
label = `<span class="hwfit-fit-dot${_fitOnly ? ' active' : ''}" title="${_fitOnly ? 'Showing only models that fit. Click to also show too-tight rows.' : 'Click to show only models that fit your hardware.'}" data-fit-dot>●</span>${col.label}`;
|
||||
// (Budget tag removed — the GPU/RAM/N-GPU suffix next to "Fit" was noise;
|
||||
// the toggle row already shows which budget is active.)
|
||||
}
|
||||
html += `<span class="hwfit-col ${col.cls}${sortable}${active}"${dataAttr}>${label}${arrow}</span>`;
|
||||
}
|
||||
html += '</div>';
|
||||
@@ -910,9 +919,31 @@ export function _hwfitRenderList(el, models) {
|
||||
const dlDot = (_cachedModelIds && (_cachedModelIds.has(m.name) || [..._cachedModelIds].some(id => id === m.name?.split('/').pop()))) ? '<span class="hwfit-dl-dot" title="Downloaded">\u25CF</span>' : '';
|
||||
html += `<div class="hwfit-row" data-model="${esc(m.name)}">`;
|
||||
html += `<span class="hwfit-col hwfit-fit" style="color:${fitColor}">${esc(fitLabel)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(m.name?.split('/').pop() || m.name)}${moeBadge}${imgBadge}${dlDot}</span>`;
|
||||
// Append quant to the title when it's not already in the repo name. The
|
||||
// suffix strips quant-parts the name already contains — e.g. for
|
||||
// QuantTrio/MiniMax-M2-AWQ + quant=AWQ-4bit we just show "(4bit)", not
|
||||
// "(AWQ-4bit)". DeepSeek-V4-Flash + FP4-MoE-Mixed keeps the full tag
|
||||
// (none of those parts are in the repo id).
|
||||
const _short = m.name?.split('/').pop() || m.name || '';
|
||||
const _quantTag = (m.quant || '').trim();
|
||||
const _lowerShort = _short.toLowerCase();
|
||||
let _quantSuffix = '';
|
||||
if (_quantTag) {
|
||||
const _parts = _quantTag.split(/[-_]/).filter(Boolean);
|
||||
const _remaining = _parts.filter(p => !_lowerShort.includes(p.toLowerCase()));
|
||||
if (_remaining.length && _remaining.length < _parts.length + 1) { // at least one part is new
|
||||
let _display = _remaining.join('-');
|
||||
if (_display.length > 9) _display = _display.slice(0, 9) + '…';
|
||||
_quantSuffix = ` <span class="hwfit-name-quant" title="${esc(_quantTag)} — full storage format">(${esc(_display)})</span>`;
|
||||
}
|
||||
}
|
||||
html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(_short)}${_quantSuffix}${moeBadge}${imgBadge}${dlDot}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-params">${esc(pcount)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-quant">${esc(m.quant || '?')}</span>`;
|
||||
// Truncate the Quant cell to 9 chars + ellipsis so long tags like
|
||||
// "FP4-MoE-Mixed" don't push neighboring columns. Full tag stays in title.
|
||||
const _qRaw = m.quant || '?';
|
||||
const _qShort = _qRaw.length > 9 ? _qRaw.slice(0, 9) + '…' : _qRaw;
|
||||
html += `<span class="hwfit-col hwfit-c-quant" title="${esc(_qRaw)}">${esc(_qShort)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-vram">${vramLabel}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-ctx">${m.is_image_gen ? '\u2014' : ctx}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-speed">${m.is_image_gen ? '\u2014' : tps + ' t/s'}</span>`;
|
||||
@@ -934,7 +965,26 @@ export function _hwfitRenderList(el, models) {
|
||||
});
|
||||
// Clickable header columns → sort (click again to toggle direction)
|
||||
el.querySelectorAll('.hwfit-header .hwfit-sortable').forEach(col => {
|
||||
col.addEventListener('click', () => {
|
||||
col.addEventListener('click', (e) => {
|
||||
// The little dot inside the Fit header is its own toggle (fit-only
|
||||
// filter), don't let it fall through to a sort click.
|
||||
if (e.target.closest('[data-fit-dot]')) {
|
||||
const on = !e.target.classList.contains('active');
|
||||
try { localStorage.setItem('hwfit_fit_only_v1', on ? '1' : '0'); } catch {}
|
||||
// Un-toggling the fit filter (off → showing too-tight rows again) is
|
||||
// typically because the user wants to see the LARGE models they can't
|
||||
// run yet — re-sort by VRAM descending so the biggest surface first.
|
||||
if (!on) {
|
||||
const sortSel = document.getElementById('hwfit-sort');
|
||||
if (sortSel) {
|
||||
sortSel.value = 'vram';
|
||||
sortSel.dataset.reverse = '0'; // descending (biggest first)
|
||||
}
|
||||
}
|
||||
_hwfitCache = null;
|
||||
_hwfitFetch();
|
||||
return;
|
||||
}
|
||||
const sortKey = col.dataset.sort;
|
||||
if (!sortKey) return;
|
||||
const sel = document.getElementById('hwfit-sort');
|
||||
@@ -1018,7 +1068,16 @@ export function _expandModelRow(row, modelData) {
|
||||
if (modelData.is_image_gen) {
|
||||
html += `<div style="font-size:10px;opacity:0.5;margin-top:4px;">${esc((modelData.capabilities || []).join(' \u00B7 ') || '')}${modelData.description ? ' \u2014 ' + esc(modelData.description) : ''}</div>`;
|
||||
} else if (_requiresAcceleratorBackend(modelData)) {
|
||||
html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
|
||||
// Only show the "needs CUDA/ROCm" note when the host doesn't already have
|
||||
// one. With a visible CUDA/ROCm accelerator the note is noise — the user
|
||||
// can already serve the model and reading the warning on every row makes
|
||||
// the panel feel like everything's broken.
|
||||
const _sys = _hwfitCache?.system || {};
|
||||
const _backend = (_sys.backend || '').toLowerCase();
|
||||
const _hasGpuAccel = !!_sys.has_gpu && (_backend === 'cuda' || _backend === 'rocm');
|
||||
if (!_hasGpuAccel) {
|
||||
html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
|
||||
}
|
||||
}
|
||||
html += `</div>`;
|
||||
|
||||
@@ -1243,14 +1302,14 @@ export function _hwfitInit() {
|
||||
const targetCtx = _ctxValue();
|
||||
try { localStorage.setItem(_CTX_KEY, String(targetCtx)); } catch {}
|
||||
// Ctx drag affects sort mode: a specific ctx target (anything < Max)
|
||||
// implies the user is hunting for "what fits at this context length",
|
||||
// so re-rank by fit (lowest first). Dragging back to Max means no
|
||||
// ctx constraint → go back to the default score-based ranking.
|
||||
// implies "what runs at this context length" — sort by VRAM ascending
|
||||
// so the cheapest-fitting models surface first. Dragging back to Max
|
||||
// releases the constraint → go back to the default score ranking.
|
||||
const sortSel = document.getElementById('hwfit-sort');
|
||||
if (sortSel) {
|
||||
if (targetCtx) {
|
||||
sortSel.value = 'fit';
|
||||
sortSel.dataset.reverse = '1';
|
||||
sortSel.value = 'vram';
|
||||
sortSel.dataset.reverse = '1'; // ascending = smallest VRAM first
|
||||
} else {
|
||||
sortSel.value = 'score';
|
||||
sortSel.dataset.reverse = '';
|
||||
|
||||
@@ -18,6 +18,7 @@ import {
|
||||
_launchServeTask, _serveAutoFix, _serveAutoRetry, _serveAutoRetryReplace, _serveAutoRetryRemove,
|
||||
_startBackgroundMonitor, _syncFromServer,
|
||||
_retryDownload, _nextAvailablePort, _processQueue,
|
||||
_selfHealStaleTasks,
|
||||
} from './cookbookRunning.js';
|
||||
|
||||
import {
|
||||
@@ -641,6 +642,13 @@ async function _fetchDependencies() {
|
||||
const winBlocked = !isLocal && _isWindows() && _winUnsupported.has(pkg.name);
|
||||
const note = pkg.status_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.65;margin-top:3px;">${esc(pkg.status_note)}</div>` : '';
|
||||
const updateNote = pkg.installed && pkg.pip_update_available === false && pkg.update_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.55;margin-top:3px;">${esc(pkg.update_note)}</div>` : '';
|
||||
// Inline "Rebuild" tag for the llama_cpp row only. Styled as a
|
||||
// .cookbook-dep-tag so it matches the LLM category tag's pill look,
|
||||
// and lives to the LEFT of the category tag (clear affordance before
|
||||
// the row "value").
|
||||
const _rebuildBtn = (pkg.name === 'llama_cpp')
|
||||
? `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build).">Rebuild</button>`
|
||||
: '';
|
||||
return `<div class="cookbook-dep-row${winBlocked ? ' cookbook-dep-blocked' : ''}" data-pkg-name="${esc(pkg.name)}" data-dep-pip="${esc(pkg.pip || '')}" data-dep-target="${isLocal ? 'local' : 'remote'}" data-dep-kind="${esc(pkg.kind || 'python')}">`
|
||||
+ `<div class="cookbook-dep-info">`
|
||||
+ `<div class="memory-item-title">${esc(pkg.name)}</div>`
|
||||
@@ -648,6 +656,7 @@ async function _fetchDependencies() {
|
||||
+ note
|
||||
+ updateNote
|
||||
+ `</div>`
|
||||
+ _rebuildBtn
|
||||
+ `<span class="cookbook-dep-tag cookbook-dep-cat">${esc(pkg.category)}</span>`
|
||||
+ _statusTag(pkg, isLocal, isSystemDep, winBlocked)
|
||||
+ `</div>`;
|
||||
@@ -1237,6 +1246,10 @@ function _wireTabEvents(body) {
|
||||
const folded = dlFoldBody.style.display === 'none';
|
||||
dlFoldBody.style.display = folded ? '' : 'none';
|
||||
dlFoldChevron.textContent = folded ? '▾' : '▸';
|
||||
// Toggle is-folded class on the h2 so the line under it only shows when
|
||||
// the section is collapsed (the body's content normally provides
|
||||
// separation; with no body visible, the line gives the h2 definition).
|
||||
dlFold.classList.toggle('is-folded', !folded);
|
||||
try { localStorage.setItem('cookbook_dl_tab_folded_v1', folded ? '0' : '1'); } catch {}
|
||||
});
|
||||
}
|
||||
@@ -1456,7 +1469,7 @@ export function _serverEntryHtml(s, i, defaultServer, forceRemote, isNew) {
|
||||
html += `<span class="cookbook-server-title" style="display:flex;align-items:center;gap:6px;width:100%;font-size:13px;font-weight:600;margin-bottom:4px;">`;
|
||||
html += `${esc(_srvTitle)}`;
|
||||
html += _pIco ? `<span class="cookbook-srv-platform" title="${esc(s.platform || '')}" style="display:inline-flex;align-items:center;opacity:0.55;">${_pIco}</span>` : '';
|
||||
html += `<span class="cookbook-srv-test-msg" style="font-size:10px;font-weight:400;opacity:0.55;max-width:160px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;position:relative;top:2px;"></span>`;
|
||||
html += `<span class="cookbook-srv-test-msg" style="font-size:10px;font-weight:400;opacity:0.55;max-width:160px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;position:relative;top:1px;"></span>`;
|
||||
if (isNew) {
|
||||
// New server: Cancel (discard) sits top-right; the default toggle only makes
|
||||
// sense once the server is saved.
|
||||
@@ -1535,7 +1548,7 @@ function _renderRecipes() {
|
||||
// State persisted to localStorage so the fold survives reloads.
|
||||
const _dlTabFolded = (() => { try { return localStorage.getItem('cookbook_dl_tab_folded_v1') === '1'; } catch { return false; } })();
|
||||
html += '<div style="display:flex;align-items:center;gap:8px;margin-bottom:2px;">';
|
||||
html += `<h2 id="cookbook-dl-tab-fold" style="margin:0;padding:0;line-height:1;cursor:pointer;display:flex;align-items:center;justify-content:space-between;user-select:none;flex:1;">Download<span id="cookbook-dl-tab-chevron" style="display:inline-block;transition:transform 0.15s;font-size:1.1em;margin-left:8px;opacity:0.85;">${_dlTabFolded ? '▸' : '▾'}</span></h2>`;
|
||||
html += `<h2 id="cookbook-dl-tab-fold" class="${_dlTabFolded ? 'is-folded' : ''}" style="margin:0;padding:0;line-height:1;cursor:pointer;display:flex;align-items:center;justify-content:space-between;user-select:none;flex:1;">Download<span id="cookbook-dl-tab-chevron" style="display:inline-block;transition:transform 0.15s;font-size:1.1em;margin-left:8px;opacity:0.85;">${_dlTabFolded ? '▸' : '▾'}</span></h2>`;
|
||||
html += '</div>';
|
||||
html += `<div id="cookbook-dl-tab-fold-body" style="${_dlTabFolded ? 'display:none;' : ''}">`;
|
||||
html += '<p class="memory-desc doclib-desc" style="margin-top:6px;">Download from <a href="https://huggingface.co/models" target="_blank" rel="noopener" style="color:var(--accent,var(--red));text-decoration:none;"><svg width="10" height="10" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;margin-right:1px;"><path d="M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6"/><polyline points="15 3 21 3 21 9"/><line x1="10" y1="14" x2="21" y2="3"/></svg>HuggingFace</a> by pasting model link, or download directly in the Scan section below.</p>';
|
||||
@@ -1605,36 +1618,43 @@ function _renderRecipes() {
|
||||
html += '<p class="memory-desc doclib-desc" style="margin-top:6px;">Scans your hardware for what models you can run. Hardware is cached; hit the scan button to re-probe after changing GPUs.</p>';
|
||||
html += '<div class="hwfit-toolbar" style="margin-top:9px;">';
|
||||
html += '<select class="cookbook-field-input hwfit-usecase" id="hwfit-usecase" style="height:28px;">';
|
||||
html += '<option value="">Type</option><option value="general">General</option><option value="coding">Coding</option>';
|
||||
html += '<option value="general" selected>Standard</option><option value="coding">Coding</option>';
|
||||
html += '<option value="reasoning">Reasoning</option><option value="chat">Chat</option>';
|
||||
// Image tab removed — text→image gen is gone from this build (only inpaint
|
||||
// remains, which uses its own settings panel). Vision (multimodal) stays.
|
||||
html += '<option value="multimodal">Vision</option></select>';
|
||||
html += '<input type="text" class="cookbook-field-input hwfit-search" id="hwfit-search" placeholder="Search models..." style="flex:1;" />';
|
||||
// Quant (Q4/Q8/…) lives next to the search now. Default is "All" so the
|
||||
// list shows the best-scoring quant for every model instead of silently
|
||||
// filtering to Q4 (which used to be the implicit default).
|
||||
html += '<select class="cookbook-field-input hwfit-quant" id="hwfit-quant" style="height:28px;">';
|
||||
html += '<option value="" selected>All</option>';
|
||||
html += '<option value="Q4_K_M">Q4</option><option value="Q8_0">Q8</option>';
|
||||
html += '<option value="Q6_K">Q6</option><option value="Q5_K_M">Q5</option>';
|
||||
html += '<option value="Q3_K_M">Q3</option><option value="Q2_K">Q2</option>';
|
||||
html += '<option value="AWQ-4bit">AWQ</option><option value="FP8">FP8</option><option value="FP4">FP4</option><option value="NVFP4">NVFP4</option></select>';
|
||||
// Engine filter — show only models whose serve engine matches. Composes
|
||||
// with quant / type / search filters.
|
||||
// Engine sits next to the type filter so the "what category / which serving
|
||||
// path" filters live together; Quant + Context are storage-format and budget
|
||||
// levers, grouped to the right.
|
||||
html += '<span class="hwfit-engine-wrap">';
|
||||
html += '<select class="cookbook-field-input hwfit-engine" id="hwfit-engine" style="height:28px;" title="Filter by serving engine">';
|
||||
html += '<option value="">Engine</option>';
|
||||
html += '<option value="llamacpp">llama.cpp</option>';
|
||||
html += '<option value="vllm">vLLM</option>';
|
||||
html += '<option value="sglang">SGLang</option>';
|
||||
html += '</select>';
|
||||
html += '<span class="hwfit-help-chip" title="Higher numbers usually mean better quality, but they need more memory. Lower numbers fit on more hardware.">?</span>';
|
||||
html += '<span class="hwfit-help-chip hwfit-help-chip-inline hwfit-engine-help" title="Rule of thumb: GGUF on single GPU / CPU+RAM → llama.cpp (or Ollama). Safetensors on multi-GPU NVIDIA → vLLM. SGLang is a vLLM-class alternative, sometimes faster on big-MoE / long-context.">?</span>';
|
||||
html += '</span>';
|
||||
// Quant (Q4/Q8/…). Default is "All" so the list shows the best-scoring
|
||||
// quant for every model instead of silently filtering to Q4.
|
||||
html += '<span class="hwfit-quant-wrap">';
|
||||
html += '<select class="cookbook-field-input hwfit-quant" id="hwfit-quant" style="height:28px;">';
|
||||
html += '<option value="" selected>Quant: All</option>';
|
||||
html += '<option value="Q4_K_M">Q4</option><option value="Q8_0">Q8</option>';
|
||||
html += '<option value="Q6_K">Q6</option><option value="Q5_K_M">Q5</option>';
|
||||
html += '<option value="Q3_K_M">Q3</option><option value="Q2_K">Q2</option>';
|
||||
html += '<option value="AWQ-4bit">AWQ</option><option value="FP8">FP8</option><option value="FP4">FP4</option><option value="NVFP4">NVFP4</option></select>';
|
||||
html += '<span class="hwfit-help-chip hwfit-help-chip-inline hwfit-quant-help" title="Lower quant tiers (Q2/Q3/Q4 / AWQ-4bit) are smaller, faster, and cheaper to run, at some quality loss. Higher tiers (Q8 / FP8 / FP16 / BF16) preserve more quality but need more VRAM. “All” shows the best-scoring quant per model — pick a specific one to filter.">?</span>';
|
||||
html += '</span>';
|
||||
// Ctx slider — lets you target a context length for fit estimates; the
|
||||
// hwfit ranking uses _ctxValue() to factor that into VRAM math, so
|
||||
// dragging this re-sorts the list toward models that fit your chosen ctx.
|
||||
html += '<label class="hwfit-ctx-control" title="Context length for fit estimates. Lower it to find more models that could fit your hardware.">';
|
||||
html += '<span>Ctx</span><span class="hwfit-help-chip hwfit-help-chip-inline" title="Context length. Lower it to find more models that could fit your hardware; raise it when you need longer chats or documents.">?</span><input type="range" id="hwfit-context" min="0" max="5" step="1" value="3" />';
|
||||
html += '<span>Context</span><span class="hwfit-help-chip hwfit-help-chip-inline" title="Context length. Lower it to find more models that could fit your hardware; raise it when you need longer chats or documents.">?</span><input type="range" id="hwfit-context" min="0" max="5" step="1" value="3" />';
|
||||
html += '<output id="hwfit-context-label">50k</output></label>';
|
||||
// Search lives at the far right of the toolbar so the controls (Type/Quant/
|
||||
// Engine/Context) read as a row of compact filters followed by free-text.
|
||||
html += '<input type="text" class="cookbook-field-input hwfit-search" id="hwfit-search" placeholder="Search models..." style="flex:1;" />';
|
||||
html += '</div>';
|
||||
html += '<div class="hwfit-toolbar" style="margin-top:7px;">';
|
||||
html += '<select class="cookbook-field-input hwfit-server-select" id="hwfit-server-select" style="height:28px;min-width:88px;position:relative;top:0px;">';
|
||||
@@ -1643,7 +1663,7 @@ function _renderRecipes() {
|
||||
html += '<div class="hwfit-gpu-toggles" id="hwfit-gpu-toggles"></div>';
|
||||
// Scan/refresh button (icon-only) where the quant dropdown used to sit.
|
||||
html += '<button type="button" class="hwfit-gpu-btn" id="hwfit-rescan" title="Re-scan hardware" style="flex-shrink:0;position:relative;top:-3px;left:-1px;">↻ RESCAN</button>';
|
||||
html += '<button type="button" class="hwfit-gpu-btn hwfit-hw-manual-btn" id="hwfit-hw-manual-btn" title="Set hardware manually" style="flex-shrink:0;position:relative;top:-3px;left:-1px;">EDIT</button>';
|
||||
html += '<button type="button" class="hwfit-gpu-btn hwfit-hw-manual-btn" id="hwfit-hw-manual-btn" title="Set hardware manually" style="flex-shrink:0;position:relative;top:-3px;left:-1px;display:inline-flex;align-items:center;gap:3px;"><svg width="10" height="10" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.2" stroke-linecap="round" stroke-linejoin="round" style="flex-shrink:0;"><path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4Z"/></svg>EDIT</button>';
|
||||
// Sort state — the clickable column headers read/write this (pewds' original
|
||||
// sort paradigm). Newest is reachable by clicking the Model column header.
|
||||
html += '<select class="cookbook-field-input hwfit-sort" id="hwfit-sort" style="display:none">';
|
||||
@@ -1663,6 +1683,16 @@ function _renderRecipes() {
|
||||
html += '</div>';
|
||||
html += '<div id="hwfit-hw-row" style="display:none;align-items:center;gap:4px;margin-top:3px;padding-top:2px;"><span style="font-size:10px;padding:2px 8px;border-radius:10px;background:color-mix(in srgb, var(--fg) 8%, transparent);color:var(--fg);opacity:0.7;white-space:nowrap;flex-shrink:0;position:relative;top:-1px;">Detected hardware</span><div class="hwfit-hw" id="hwfit-hw" style="flex:1;"></div></div>';
|
||||
html += '<div class="hwfit-list" id="hwfit-list"></div>';
|
||||
// Footer: link to the public discussion where users can request additions
|
||||
// to the curated model list. Sits below the list so it reads as a callout
|
||||
// after browsing, not a header.
|
||||
html += '<div class="hwfit-list-footer" style="margin-top:8px;padding-top:6px;border-top:1px solid color-mix(in srgb, var(--border) 50%, transparent);font-size:9.5px;opacity:0.65;text-align:right;">'
|
||||
+ 'Don\'t see a model? '
|
||||
+ '<a href="https://github.com/pewdiepie-archdaemon/odysseus/discussions/1962" target="_blank" rel="noopener" style="color:var(--accent,var(--red));text-decoration:none;display:inline-flex;align-items:center;gap:4px;vertical-align:middle;">'
|
||||
+ 'Request it →'
|
||||
+ '<svg width="11" height="11" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" style="flex-shrink:0;"><path d="M8 0C3.58 0 0 3.58 0 8a8 8 0 0 0 5.47 7.59c.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"/></svg>'
|
||||
+ '</a>'
|
||||
+ '</div>';
|
||||
|
||||
html += '</div></div>';
|
||||
|
||||
@@ -1707,7 +1737,8 @@ function _renderRecipes() {
|
||||
html += '<div class="admin-card" style="flex:1;display:flex;flex-direction:column;overflow:hidden;">';
|
||||
html += '<div style="display:flex;align-items:center;gap:8px;margin-bottom:4px;">';
|
||||
html += '<h2 style="margin:0;padding:0;line-height:1;">Dependencies</h2>';
|
||||
html += '<button class="cookbook-field-input" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build)." style="height:24px;font-size:10px;padding:0 8px;cursor:pointer;width:auto;">Rebuild llama.cpp</button>';
|
||||
// Rebuild llama.cpp button moved into the llama_cpp dep row (see _depRow);
|
||||
// having it in the title polluted the section header.
|
||||
html += '<span style="font-size:10px;opacity:0.5;margin-left:auto;">Server</span>';
|
||||
html += '<select class="cookbook-field-input" id="hwfit-deps-server" style="height:28px;min-width:70px;">';
|
||||
html += _buildServerOpts(false);
|
||||
@@ -1746,7 +1777,7 @@ function _renderRecipes() {
|
||||
|
||||
// ── Servers block ───────────────────────────────────────────────────
|
||||
html += '<div class="admin-card" style="flex:0 0 auto;display:flex;flex-direction:column;">';
|
||||
html += '<div style="display:flex;align-items:baseline;gap:8px;margin-bottom:2px;margin-top:-8px;">';
|
||||
html += '<div style="display:flex;align-items:baseline;gap:8px;margin-bottom:2px;margin-top:-4px;">';
|
||||
html += '<h2 style="margin:0;padding:0;line-height:1;">Servers</h2>';
|
||||
// Reuse the calendar +New pill: spinning plus, label fades in idea uses
|
||||
// the same `.cal-add-btn-text` rules, so styling stays consistent.
|
||||
@@ -1893,6 +1924,11 @@ export async function open(opts) {
|
||||
_rendered = true;
|
||||
_clearCookbookNotif();
|
||||
_renderRunningTab();
|
||||
// Self-heal: revive any download tasks whose tmux session is still alive
|
||||
// but were persisted as done/error (covers the "restarted server while a
|
||||
// big multi-shard download was in flight" case — the task survived in
|
||||
// tmux, the cookbook just lost track of it).
|
||||
try { _selfHealStaleTasks({ oneShot: true }); } catch {}
|
||||
if (_content) {
|
||||
// Put the panel in its entering state before it becomes visible. On
|
||||
// mobile, showing first and adding the class a frame later can paint the
|
||||
|
||||
@@ -535,6 +535,42 @@ export async function _runModelDownload(panel, model, backend, hostOverride) {
|
||||
uiModule.showToast(`${shortName} is already ${duplicate.status === 'queued' ? 'queued' : 'downloading'}`);
|
||||
return;
|
||||
}
|
||||
// Also catch zombie "done" tasks — the cookbook may have lost track of a
|
||||
// download (server restart, stale state) while its tmux session is still
|
||||
// alive on the host. Probe it; if alive, flip back to running + treat as
|
||||
// duplicate so we don't kick off a second concurrent download writing to
|
||||
// the same target dir.
|
||||
const zombieCandidate = tasks.find(t => sameDownload(t)
|
||||
&& ['done', 'error', 'crashed', 'stopped'].includes(t.status)
|
||||
&& t.sessionId && !String(t.sessionId).startsWith('queue-'));
|
||||
if (zombieCandidate) {
|
||||
try {
|
||||
const _zh = zombieCandidate.remoteHost || '';
|
||||
const _zPort = (_envState.servers || []).find(s => s.host === _zh)?.port;
|
||||
const _sshPf = _zh ? `ssh ${_zPort && _zPort !== '22' ? `-p ${_zPort} ` : ''}${_zh} '` : '';
|
||||
const _sshSf = _zh ? `'` : '';
|
||||
const _probeCmd = `${_sshPf}tmux has-session -t ${zombieCandidate.sessionId} 2>/dev/null${_sshSf}`;
|
||||
const _r = await fetch('/api/shell/exec', {
|
||||
method: 'POST', credentials: 'same-origin',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ command: _probeCmd, timeout: 5 }),
|
||||
});
|
||||
const _d = await _r.json();
|
||||
if (_d.exit_code === 0) {
|
||||
// tmux still alive → not actually done. Revive + tell the user.
|
||||
const _fresh = _loadTasks();
|
||||
const _ft = _fresh.find(t => t.sessionId === zombieCandidate.sessionId);
|
||||
if (_ft) {
|
||||
_ft.status = 'running';
|
||||
_ft._selfHealed = true;
|
||||
_saveTasks(_fresh);
|
||||
}
|
||||
_renderRunningTab();
|
||||
uiModule.showToast(`${shortName} is still downloading (was marked finished after a restart — revived)`);
|
||||
return;
|
||||
}
|
||||
} catch { /* probe failed — fall through and let the user launch */ }
|
||||
}
|
||||
const activeOnHost = tasks.find(t => t.type === 'download' && (t.status === 'running' || t.status === 'queued') && (t.remoteHost || 'local') === targetHost);
|
||||
|
||||
if (activeOnHost) {
|
||||
|
||||
@@ -35,13 +35,34 @@ function _taskBadge(task) {
|
||||
return { text: _statusLabel(task.status, task.type), cls: 'cookbook-task-' + task.status };
|
||||
}
|
||||
|
||||
// A download task whose tmux output still shows an active per-shard line
|
||||
// (e.g. "model-00012-of-00082.safetensors: 56%|") is NOT actually finished —
|
||||
// the cookbook just lost track. The clear pill becomes a "reconnect" affordance
|
||||
// in that case (click → revive the row + reattach the poll loop).
|
||||
function _downloadOutputLooksActive(task) {
|
||||
if (!task || task.type !== 'download') return false;
|
||||
const out = task.output || '';
|
||||
if (!out) return false;
|
||||
if (out.includes('DOWNLOAD_OK') || out.includes('DOWNLOAD_FAILED')) return false;
|
||||
// An active shard line: filename + a colon + a percentage that isn't 100%.
|
||||
// We catch any in-flight shard or "Downloading 'X' to ..." line (no %).
|
||||
return /model-\d+-of-\d+\.[a-z]+:\s+(?!100%)\d+%/i.test(out)
|
||||
|| /Downloading\s+'[^']+'\s+to\s+'[^']*\.incomplete'/i.test(out);
|
||||
}
|
||||
|
||||
function _canClearTask(task) {
|
||||
if (!task || task.status === 'running') return false;
|
||||
if (task.type === 'serve' && (task.status === 'ready' || task._serveReady)) return false;
|
||||
// If the tmux output still shows an in-flight download, the task isn't
|
||||
// actually finished — hide the clear/check pill so it doesn't show on a
|
||||
// task that's still doing work. (The next render will reflect this and
|
||||
// ideally the self-heal flips status back to running.)
|
||||
if (_downloadOutputLooksActive(task)) return false;
|
||||
return ['done', 'stopped', 'error', 'crashed', 'failed'].includes(task.status);
|
||||
}
|
||||
|
||||
function _clearPillLabel(task) {
|
||||
if (_downloadOutputLooksActive(task)) return 'reconnect';
|
||||
return 'clear';
|
||||
}
|
||||
|
||||
@@ -1537,7 +1558,16 @@ export function _renderRunningTab() {
|
||||
|
||||
const tasks = _loadTasks();
|
||||
const hasContent = tasks.length > 0;
|
||||
const activeCount = tasks.filter(t => t.status === 'running' || t.status === 'queued').length;
|
||||
// Count anything that's really active: explicit 'running'/'queued' status,
|
||||
// OR a download whose tmux output is still showing live shard progress.
|
||||
// Without the output check, a task whose status got stuck at 'done' /
|
||||
// 'crashed' (before auto-reconnect catches it) would read as "Running 0"
|
||||
// even when the model is actively downloading on the host.
|
||||
const activeCount = tasks.filter(t =>
|
||||
t.status === 'running'
|
||||
|| t.status === 'queued'
|
||||
|| _downloadOutputLooksActive(t)
|
||||
).length;
|
||||
const activeCountHtml = activeCount ? ` <span class="cookbook-tab-count">${activeCount}</span>` : '';
|
||||
|
||||
let tabBar = body.querySelector('.cookbook-tabs');
|
||||
@@ -1824,9 +1854,31 @@ export function _renderRunningTab() {
|
||||
const h = Math.floor(secs / 3600);
|
||||
const m = Math.floor((secs % 3600) / 60);
|
||||
const s = secs % 60;
|
||||
_uptimeEl.textContent = h > 0
|
||||
const _timer = h > 0
|
||||
? `${_prefix}: ${h}h ${String(m).padStart(2,'0')}m`
|
||||
: `${_prefix}: ${m}m ${String(s).padStart(2,'0')}s`;
|
||||
// ETA — only for downloads, only when we have a meaningful overall %.
|
||||
// Reads the badge text (which already shows the true overall % we
|
||||
// compute in the live-polling block) and back-derives a remaining-time
|
||||
// estimate from elapsed/done. Hidden until pct >= 3% so the early-job
|
||||
// wild estimates don't show.
|
||||
let _eta = '';
|
||||
if (task.type === 'download') {
|
||||
const _badge = el.querySelector('.cookbook-task-status');
|
||||
const _m = _badge && /^(\d+)%/.exec(_badge.textContent || '');
|
||||
const _pct = _m ? parseInt(_m[1], 10) : 0;
|
||||
if (_pct >= 3 && _pct < 100 && secs > 5) {
|
||||
const _totalSec = Math.round(secs * (100 / _pct));
|
||||
const _remain = Math.max(0, _totalSec - secs);
|
||||
const _eh = Math.floor(_remain / 3600);
|
||||
const _em = Math.floor((_remain % 3600) / 60);
|
||||
const _es = _remain % 60;
|
||||
_eta = _eh > 0
|
||||
? ` · ETA ${_eh}h ${String(_em).padStart(2,'0')}m`
|
||||
: (_em > 0 ? ` · ETA ${_em}m ${String(_es).padStart(2,'0')}s` : ` · ETA ${_es}s`);
|
||||
}
|
||||
}
|
||||
_uptimeEl.textContent = _timer + _eta;
|
||||
}, 1000);
|
||||
}
|
||||
|
||||
@@ -1874,11 +1926,32 @@ export function _renderRunningTab() {
|
||||
if (_clearChk) {
|
||||
_clearChk.addEventListener('click', (e) => {
|
||||
e.stopPropagation();
|
||||
// Belt-and-suspenders: kill the tmux session too. For a real-finished
|
||||
// task the session is already gone and kill-session errors silently,
|
||||
// but for a task that was falsely flagged done (the strict-finish
|
||||
// bug), this guarantees the still-running download actually stops
|
||||
// rather than continuing to write to disk after the row is removed.
|
||||
// If the output still shows an active shard line, the task isn't
|
||||
// actually finished — clicking is "reconnect" (flip back to running
|
||||
// + let _reconnectTask reattach to the live tmux session), not
|
||||
// "clear". The pill label already reflects this via _clearPillLabel.
|
||||
if (_downloadOutputLooksActive(task)) {
|
||||
const _fresh = _loadTasks();
|
||||
const _ft = _fresh.find(t => t.sessionId === task.sessionId);
|
||||
if (_ft) {
|
||||
_ft.status = 'running';
|
||||
_ft._selfHealed = true;
|
||||
_saveTasks(_fresh);
|
||||
}
|
||||
// Visually flip without waiting for a full re-render — same path the
|
||||
// self-heal uses on cookbook open.
|
||||
const _chk = el.querySelector('.cookbook-task-check');
|
||||
if (_chk) _chk.style.display = 'none';
|
||||
const _wave = el.querySelector('.cookbook-task-wave');
|
||||
if (_wave) _wave.style.display = '';
|
||||
const _up = el.querySelector('.cookbook-task-uptime');
|
||||
if (_up) _up.style.display = '';
|
||||
el.dataset.status = 'running';
|
||||
_renderRunningTab();
|
||||
return;
|
||||
}
|
||||
// Otherwise: real clear. Kill the tmux session as belt-and-suspenders,
|
||||
// then animate out + remove the row.
|
||||
try {
|
||||
fetch('/api/shell/exec', {
|
||||
method: 'POST', credentials: 'same-origin',
|
||||
@@ -2964,9 +3037,84 @@ function _refreshServerDots() {
|
||||
_syncSettingsServerDots(byKey);
|
||||
}
|
||||
|
||||
// Self-heal: scan persisted download tasks marked done/error/crashed and
|
||||
// check whether their tmux session is still alive on the host. If yes —
|
||||
// the task isn't actually finished, the cookbook just lost the in-flight
|
||||
// status during restart — flip status back to 'running' so _reconnectTask
|
||||
// picks it up. The one-shot guard is enforced by callers (open path) or
|
||||
// time-throttled inside (background-monitor path).
|
||||
let _selfHealRan = false;
|
||||
let _selfHealLastTs = 0;
|
||||
export async function _selfHealStaleTasks(opts = {}) {
|
||||
// Open-path call: one-shot per page load.
|
||||
if (opts.oneShot) {
|
||||
if (_selfHealRan) return;
|
||||
_selfHealRan = true;
|
||||
} else {
|
||||
// Background-monitor call: throttle to once every 8s (the bg monitor
|
||||
// itself fires every 10s, so this almost always fires too, but the
|
||||
// guard keeps a fast manual call from doubling up).
|
||||
const now = Date.now();
|
||||
if (now - _selfHealLastTs < 8000) return;
|
||||
_selfHealLastTs = now;
|
||||
}
|
||||
const tasks = _loadTasks();
|
||||
const candidates = tasks.filter(t =>
|
||||
t.type === 'download'
|
||||
&& ['done', 'error', 'crashed', 'stopped'].includes(t.status)
|
||||
&& t.sessionId
|
||||
&& !String(t.sessionId).startsWith('queue-')
|
||||
);
|
||||
if (!candidates.length) return;
|
||||
let flipped = 0;
|
||||
for (const t of candidates) {
|
||||
try {
|
||||
const res = await fetch('/api/shell/exec', {
|
||||
method: 'POST', credentials: 'same-origin',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ command: _tmuxCmd(t, `has-session -t ${t.sessionId}`), timeout: 5 }),
|
||||
});
|
||||
const data = await res.json();
|
||||
if (data.exit_code === 0) {
|
||||
// Session still alive → the task is actually still running.
|
||||
const fresh = _loadTasks();
|
||||
const ft = fresh.find(x => x.sessionId === t.sessionId);
|
||||
if (ft && ft.status !== 'running') {
|
||||
ft.status = 'running';
|
||||
ft._selfHealed = true;
|
||||
_saveTasks(fresh);
|
||||
flipped++;
|
||||
const _el = document.querySelector(`.cookbook-task[data-task-id="${t.sessionId}"]`);
|
||||
if (_el) {
|
||||
const _chk = _el.querySelector('.cookbook-task-check');
|
||||
if (_chk) _chk.style.display = 'none';
|
||||
const _wave = _el.querySelector('.cookbook-task-wave');
|
||||
if (_wave) _wave.style.display = '';
|
||||
const _up = _el.querySelector('.cookbook-task-uptime');
|
||||
if (_up) _up.style.display = '';
|
||||
_el.dataset.status = 'running';
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch { /* network blip — skip this one */ }
|
||||
}
|
||||
if (flipped) {
|
||||
console.log(`[cookbook] auto-reconnect: revived ${flipped} task(s) whose tmux session was still alive`);
|
||||
_renderRunningTab();
|
||||
}
|
||||
}
|
||||
|
||||
export function _startBackgroundMonitor() {
|
||||
if (_bgMonitorInterval) return;
|
||||
_bgMonitorInterval = setInterval(() => { _pollBackgroundStatus(); _checkServeReachability(); }, BG_MONITOR_INTERVAL_MS);
|
||||
_bgMonitorInterval = setInterval(() => {
|
||||
_pollBackgroundStatus();
|
||||
_checkServeReachability();
|
||||
// Auto-reconnect: every cycle, look for download tasks marked finished/
|
||||
// crashed/etc. whose tmux session is actually still running, and flip
|
||||
// them back to running. Internally throttled to 8s so a manual call from
|
||||
// the open path or a fast invocation doesn't double up.
|
||||
_selfHealStaleTasks().catch(() => {});
|
||||
}, BG_MONITOR_INTERVAL_MS);
|
||||
_pollBackgroundStatus();
|
||||
_checkServeReachability();
|
||||
}
|
||||
|
||||
@@ -560,6 +560,15 @@ function _rerenderCachedModels() {
|
||||
+ `</div>`;
|
||||
|
||||
let panelHtml = `<div class="hwfit-serve-panel">${_slotsHtml}`;
|
||||
// Warn when serving a model whose download hasn't fully completed —
|
||||
// the user CAN still hit Launch (vLLM/llama-server will start, then
|
||||
// crash trying to read missing shards), but they should know.
|
||||
if (m && (m.status === 'downloading' || m.status === 'stalled' || m.has_incomplete)) {
|
||||
const _warnText = m.status === 'stalled'
|
||||
? `This model looks like a stale download shell (${esc(m.size || '0 KB')}). The weights aren't on disk — the serve will fail to load. Re-download first, or pick another model.`
|
||||
: `This model's download isn't complete yet (${esc(m.size || 'partial')}). The serve will start but is likely to crash on a missing shard. Wait for the download to finish, or relaunch after it's done.`;
|
||||
panelHtml += `<div class="hwfit-serve-warn" style="margin:0 0 8px;padding:6px 10px;border-radius:5px;font-size:11px;background:color-mix(in srgb, var(--color-warning, #f0ad4e) 14%, transparent);border:1px solid color-mix(in srgb, var(--color-warning, #f0ad4e) 40%, transparent);color:var(--color-warning, #f0ad4e);display:flex;gap:6px;align-items:flex-start;line-height:1.4;"><span aria-hidden="true">⚠</span><span>${_warnText}</span></div>`;
|
||||
}
|
||||
// Row 1: Backend + Server + Env
|
||||
panelHtml += `<div class="hwfit-serve-row">`;
|
||||
const _backendChoices = _isWindows()
|
||||
@@ -597,13 +606,13 @@ function _rerenderCachedModels() {
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('TP','Tensor Parallelism — split model across N GPUs')}<select class="hwfit-sf" data-field="tp">${tpOpts}</select></label>`;
|
||||
// ctx resets to the model's max on every panel open (the real ctx slider
|
||||
// lives in the Scan/Download toolbar — see cookbook.js .hwfit-ctx-control).
|
||||
panelHtml += `<label>${_l('Context','Max tokens per request — resets to the model max on every open. Lower = less VRAM')}<input type="text" class="hwfit-sf" data-field="ctx" value="${esc(m.context_length || m.context || '8192')}" /></label>`;
|
||||
panelHtml += `<label>${_l('Context','Max tokens per request — resets to the model max on every open. Lower = less VRAM')}<input type="text" class="hwfit-sf" data-field="ctx" value="${esc(m.context_length || m.context || '20000')}" /></label>`;
|
||||
panelHtml += `<label>${_l('GPU','Which GPU to use. Leave empty for default')}<input type="text" class="hwfit-sf" data-field="gpu_id" value="${esc(sv('gpu_id', ''))}" placeholder="auto" style="width:50px;" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('GPU Mem','Fraction of GPU memory (0.0–1.0). Lower if OOM')}<input type="text" class="hwfit-sf" data-field="gpu_mem" value="${esc(sv('gpu_mem', '0.90'))}" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm">${_l('Swap','CPU swap space in GB. Leave empty to omit (removed in newer vLLM)')}<input type="text" class="hwfit-sf" data-field="swap" value="${esc(sv('swap', ''))}" placeholder="off" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 8 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '8'))}" placeholder="8" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 4 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '4'))}" placeholder="4" /></label>`;
|
||||
panelHtml += `<label>${_l('Dtype','Data type for weights. auto picks best for GPU')}<select class="hwfit-sf" data-field="dtype">${dtypeOpts}</select></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype">${vllmKvCacheOpts}</select></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype" style="height:32px;">${vllmKvCacheOpts}</select></label>`;
|
||||
panelHtml += `</div>`;
|
||||
// Row 2b: Diffusers settings
|
||||
const diffDtypeOpts = ['bfloat16','float16','float32'].map(d => `<option value="${d}"${sv('diff_dtype','bfloat16')===d?' selected':''}>${d}</option>`).join('');
|
||||
@@ -696,7 +705,7 @@ function _rerenderCachedModels() {
|
||||
if (!_specMethods.includes(_specMethod)) _specMethods.unshift(_specMethod);
|
||||
const _specOpts = _specMethods.map(m =>
|
||||
`<option value="${m}"${m === _specMethod ? ' selected' : ''}>${m}</option>`).join('');
|
||||
panelHtml += `<label class="hwfit-sf-cb hwfit-spec-group"><input type="checkbox" class="hwfit-sf" data-field="speculative" /> Speculative <select class="hwfit-sf hwfit-spec-method" data-field="spec_method" title="vLLM --speculative-config method">${_specOpts}</select><span class="hwfit-numstep"><button type="button" class="hwfit-numstep-btn" data-step="-1" tabindex="-1" aria-label="Decrease">‹</button><input type="number" class="hwfit-sf hwfit-spec-tokens" data-field="spec_tokens" value="${esc(_specTokens)}" min="1" max="10" title="num_speculative_tokens" /><button type="button" class="hwfit-numstep-btn" data-step="1" tabindex="-1" aria-label="Increase">›</button></span></label>`;
|
||||
panelHtml += `<label class="hwfit-sf-cb hwfit-spec-group"><input type="checkbox" class="hwfit-sf" data-field="speculative" /> Speculative <select class="hwfit-sf hwfit-spec-method" data-field="spec_method" title="vLLM --speculative-config method">${_specOpts}</select><span class="hwfit-numstep"><button type="button" class="hwfit-numstep-btn" data-step="-1" tabindex="-1" aria-label="Decrease">‹</button><input type="number" class="hwfit-sf hwfit-spec-tokens" data-field="spec_tokens" value="${esc(_specTokens)}" min="1" max="10" title="num_speculative_tokens" /><button type="button" class="hwfit-numstep-btn" data-step="1" tabindex="-1" aria-label="Increase">›</button></span><span class="hwfit-help-chip hwfit-help-chip-inline" title="MTP / speculative decoding is supported on a few model families only — turn it on when the model card explicitly recommends it. On supported models it can boost inference throughput up to ~3×; on unsupported models it will either be ignored or fail to launch." style="margin-left:6px;">?</span></label>`;
|
||||
}
|
||||
if (_opts2.envVars.length) panelHtml += `<label class="hwfit-sf-cb"><input type="checkbox" class="hwfit-sf" data-field="moe_env" /> MoE Env Vars</label>`;
|
||||
panelHtml += `</div>`;
|
||||
@@ -721,7 +730,7 @@ function _rerenderCachedModels() {
|
||||
// pushes Cancel + Launch to the right.
|
||||
panelHtml += `<span class="hwfit-serve-actions-spacer"></span>`;
|
||||
panelHtml += `<button class="cookbook-btn hwfit-serve-cancel" type="button" title="Close this configuration panel">Cancel</button>`;
|
||||
panelHtml += `<button class="cookbook-btn hwfit-serve-launch">Launch</button>`;
|
||||
panelHtml += `<button class="cookbook-btn hwfit-serve-launch"><svg width="11" height="11" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;margin-right:4px;flex-shrink:0;"><polygon points="13 2 3 14 12 14 11 22 21 10 12 10 13 2"/></svg>Launch</button>`;
|
||||
panelHtml += `</div>`;
|
||||
panelHtml += `</div>`;
|
||||
|
||||
@@ -1657,6 +1666,37 @@ function _rerenderCachedModels() {
|
||||
});
|
||||
return;
|
||||
}
|
||||
// Pre-launch GPU probe — common failure pattern: vLLM/SGLang launched
|
||||
// on a host where no GPU is visible (driver missing, $CUDA_VISIBLE_DEVICES
|
||||
// unset, container without --gpus). Catch it BEFORE the user spends
|
||||
// minutes watching the task fail.
|
||||
const _needsGpu = ['vllm', 'sglang'].includes(serveState.backend)
|
||||
|| (serveState.backend === 'diffusers');
|
||||
if (_needsGpu) {
|
||||
try {
|
||||
const _probeHost = (_envState.remoteHost || '').trim();
|
||||
const _probeParams = new URLSearchParams();
|
||||
if (_probeHost) {
|
||||
_probeParams.set('host', _probeHost);
|
||||
const _sp = (_envState.servers || []).find(s => s.host === _probeHost)?.port;
|
||||
if (_sp) _probeParams.set('ssh_port', _sp);
|
||||
}
|
||||
const _probeRes = await fetch('/api/cookbook/gpus' + (_probeParams.toString() ? '?' + _probeParams : ''), { credentials: 'same-origin' });
|
||||
const _probeData = await _probeRes.json();
|
||||
const _probeGpus = Array.isArray(_probeData) ? _probeData : (_probeData.gpus || []);
|
||||
if (!_probeGpus.length) {
|
||||
const _proceed = await window.styledConfirm(
|
||||
`No GPU detected on ${_probeHost ? _probeHost : 'this host'}. ${serveState.backend.toUpperCase()} needs a visible CUDA/ROCm accelerator to start — launching now will most likely crash early.\n\nLaunch anyway?`,
|
||||
{ title: 'No GPU detected', confirmText: 'Launch anyway', cancelText: 'Cancel', danger: true },
|
||||
);
|
||||
if (!_proceed) return;
|
||||
}
|
||||
} catch {
|
||||
// Network / probe failure — don't block. Better to let the launch
|
||||
// proceed than to silently refuse because the probe endpoint
|
||||
// hiccuped (the user can read the real error in the task output).
|
||||
}
|
||||
}
|
||||
// Save in the { _byRepo, _lastUsed } schema — no legacy flat keys at
|
||||
// the root so per-model state doesn't leak between models.
|
||||
try {
|
||||
|
||||
Reference in New Issue
Block a user