Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI
Backend (services/hwfit + routes):
- VRAM column sort now shows global highest first (was special-cased to
ascending then truncated top-N, which made "highest VRAM" mathematically
unreachable). Every column path uses reverse=True for the truncation.
- Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep
re-probing the rig during a session; Rescan button still forces fresh.
- Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them);
default non-prequantized to BF16 on 2+ GPUs.
- AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties.
- Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5.
- hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped
Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered
with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the
actual 156 GB / 284 GB disk footprints).
- New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT /
QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES.
Frontend — Scan/Download:
- Engine + Quant swapped in the toolbar; Quant defaults to "All".
- Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag
re-sorts by vram ascending (smallest fitting first); back to Max → score.
- Ctx slider rail now visible — was background:transparent in a duplicate
later-cascade rule. Hardcoded grey + !important.
- Search input moved to the far right of the toolbar.
- Type/Standard default; "Context" not uppercased; Search placeholder dimmed.
- Engine "?" + Quant "?" inline help chips inside their dropdown boxes.
- Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc.
- Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in
tooltip. Smart title-suffix strips the parts already in the repo name
(QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)").
- Conditional warning for safetensors models on non-GPU rigs only.
- Dependency Install / Installed / Installed▾ / N/A all 75.85px wide.
- Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag.
- Foldable Download admin-card (h2 chevron); line under h2 only when folded.
- HF token save gets a green ✓ + "Saved" flash.
- Cached scan no longer counts stalled rows as downloaded.
- Footer: "Request it →" link with GitHub mark to the public discussion
(#1962) for model-add requests.
Frontend — Running tab:
- Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare
"Download complete"). True overall % for multi-shard downloads:
((N-1)+frac)/total instead of hf_transfer's per-shard aggregate.
- ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m".
- Clear button kills the tmux session too; if the output still shows a
live shard line, the pill is hidden + relabels as "reconnect" + revives
on click.
- Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled
to 8s), scan persisted done/error/crashed downloads and probe their
tmux session — if alive, flip status back to running and reattach.
- Per-launch zombie probe: clicking Download on a model whose persisted
state is done but tmux is still alive revives the existing task and
refuses to start a duplicate.
- Pre-launch GPU probe: vllm / sglang / diffusers serve check
/api/cookbook/gpus first; warns + confirms if no GPU is visible.
- Server-side state guard: rejects "done" POSTs for downloads lacking
DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned
shard is N<total — stale tabs can't poison persisted state any more.
- Running count includes tasks whose output looks active even if persisted
status got stuck. Dir text on the running row, font matched to uptime.
Serve panel:
- Ctx text input always resets to model max on open (default 20000 when
metadata is missing).
- Max Seqs default 8 -> 4. KV Cache dtype select 32px tall.
- Lightning icon on Launch (same as Action toggle).
- Diagnosis card simplified (no fold/copy/dismiss), suggestion font
matches body; action buttons get icons on the left (Retry/Copy/Edit/
Install/Kill/Switch/etc.).
- Incomplete-download serve warning when model status is
downloading / stalled / has_incomplete.
- MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
This commit is contained in:
@@ -5110,6 +5110,100 @@
|
||||
"release_date": "2023-10-29",
|
||||
"_discovered": true
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/DeepSeek-V4-Flash",
|
||||
"provider": "deepseek-ai",
|
||||
"parameter_count": "284B",
|
||||
"parameters_raw": 284000000000,
|
||||
"active_parameters": 13000000000,
|
||||
"is_moe": true,
|
||||
"min_ram_gb": 200.0,
|
||||
"recommended_ram_gb": 320.0,
|
||||
"min_vram_gb": 156.0,
|
||||
"quantization": "FP4-MoE-Mixed",
|
||||
"context_length": 1000000,
|
||||
"use_case": "General-purpose reasoning, long-context",
|
||||
"capabilities": [
|
||||
"long_context",
|
||||
"reasoning",
|
||||
"moe"
|
||||
],
|
||||
"pipeline_tag": "text-generation",
|
||||
"architecture": "deepseek_v4_moe",
|
||||
"hf_downloads": 3542202,
|
||||
"hf_likes": 0,
|
||||
"release_date": "2026-05-15"
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/DeepSeek-V4-Flash-Base",
|
||||
"provider": "deepseek-ai",
|
||||
"parameter_count": "284B",
|
||||
"parameters_raw": 284000000000,
|
||||
"active_parameters": 13000000000,
|
||||
"is_moe": true,
|
||||
"min_ram_gb": 290.0,
|
||||
"recommended_ram_gb": 460.0,
|
||||
"min_vram_gb": 284.0,
|
||||
"quantization": "FP8-Mixed",
|
||||
"context_length": 1000000,
|
||||
"use_case": "Base pretrained \u2014 fine-tuning starting point",
|
||||
"capabilities": [
|
||||
"long_context",
|
||||
"moe"
|
||||
],
|
||||
"pipeline_tag": "text-generation",
|
||||
"architecture": "deepseek_v4_moe",
|
||||
"hf_downloads": 0,
|
||||
"hf_likes": 0,
|
||||
"release_date": "2026-05-15"
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/DeepSeek-V4-Pro",
|
||||
"provider": "deepseek-ai",
|
||||
"parameter_count": "1.6T",
|
||||
"parameters_raw": 1600000000000,
|
||||
"active_parameters": 49000000000,
|
||||
"is_moe": true,
|
||||
"min_ram_gb": 1100.0,
|
||||
"recommended_ram_gb": 1800.0,
|
||||
"min_vram_gb": 880.0,
|
||||
"quantization": "FP4-MoE-Mixed",
|
||||
"context_length": 1000000,
|
||||
"use_case": "Flagship reasoning, long-context",
|
||||
"capabilities": [
|
||||
"long_context",
|
||||
"reasoning",
|
||||
"moe"
|
||||
],
|
||||
"pipeline_tag": "text-generation",
|
||||
"architecture": "deepseek_v4_moe",
|
||||
"hf_downloads": 0,
|
||||
"hf_likes": 0,
|
||||
"release_date": "2026-05-15"
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/DeepSeek-V4-Pro-Base",
|
||||
"provider": "deepseek-ai",
|
||||
"parameter_count": "1.6T",
|
||||
"parameters_raw": 1600000000000,
|
||||
"active_parameters": 49000000000,
|
||||
"is_moe": true,
|
||||
"min_ram_gb": 1700.0,
|
||||
"recommended_ram_gb": 2600.0,
|
||||
"min_vram_gb": 1600.0,
|
||||
"quantization": "FP8-Mixed",
|
||||
"context_length": 1000000,
|
||||
"use_case": "Base pretrained \u2014 fine-tuning starting point",
|
||||
"capabilities": [
|
||||
"long_context",
|
||||
"moe"
|
||||
],
|
||||
"pipeline_tag": "text-generation",
|
||||
"architecture": "deepseek_v4_moe",
|
||||
"hf_downloads": 0,
|
||||
"hf_likes": 0,
|
||||
"release_date": "2026-05-15"
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/deepseek-coder-6.7b-base",
|
||||
"provider": "DeepSeek",
|
||||
@@ -13886,53 +13980,6 @@
|
||||
"gguf_sources": [],
|
||||
"capabilities": []
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/DeepSeek-V4-Flash",
|
||||
"provider": "DeepSeek",
|
||||
"parameter_count": "158B",
|
||||
"parameters_raw": 158000000000,
|
||||
"min_ram_gb": 165.0,
|
||||
"recommended_ram_gb": 205.0,
|
||||
"min_vram_gb": 165.0,
|
||||
"quantization": "FP8",
|
||||
"context_length": 1000000,
|
||||
"use_case": "General purpose, reasoning (MoE)",
|
||||
"is_moe": true,
|
||||
"num_experts": null,
|
||||
"active_experts": null,
|
||||
"active_parameters": 13000000000,
|
||||
"architecture": "deepseek_v4",
|
||||
"pipeline_tag": "text-generation",
|
||||
"release_date": "2026-04-22",
|
||||
"gguf_sources": [
|
||||
{
|
||||
"repo": "unsloth/DeepSeek-V4-Flash",
|
||||
"provider": "unsloth"
|
||||
}
|
||||
],
|
||||
"capabilities": []
|
||||
},
|
||||
{
|
||||
"name": "deepseek-ai/DeepSeek-V4-Pro",
|
||||
"provider": "DeepSeek",
|
||||
"parameter_count": "1600B",
|
||||
"parameters_raw": 1600000000000,
|
||||
"min_ram_gb": 928.5,
|
||||
"recommended_ram_gb": 1207.0,
|
||||
"min_vram_gb": 928.5,
|
||||
"quantization": "Q4_K_M",
|
||||
"context_length": 1000000,
|
||||
"use_case": "Frontier reasoning (MoE)",
|
||||
"is_moe": true,
|
||||
"num_experts": null,
|
||||
"active_experts": null,
|
||||
"active_parameters": 49000000000,
|
||||
"architecture": "deepseek_v4",
|
||||
"pipeline_tag": "text-generation",
|
||||
"release_date": "2026-04-22",
|
||||
"gguf_sources": [],
|
||||
"capabilities": []
|
||||
},
|
||||
{
|
||||
"name": "google/gemma-4-E2B-it",
|
||||
"provider": "Google",
|
||||
|
||||
@@ -564,7 +564,7 @@ def rank_models(system, use_case=None, limit=50, search=None, sort="score", quan
|
||||
})
|
||||
if use_case == "image_gen":
|
||||
sort_fn = SORT_KEYS.get(sort, SORT_KEYS["score"])
|
||||
results.sort(key=sort_fn, reverse=(sort != "vram"))
|
||||
results.sort(key=sort_fn, reverse=True) # see main path below
|
||||
return results[:limit]
|
||||
|
||||
# If user picked a native prequantized format, filter to only those models.
|
||||
@@ -661,7 +661,10 @@ def rank_models(system, use_case=None, limit=50, search=None, sort="score", quan
|
||||
# explicitly asked for a Fit-only view.
|
||||
results = [r for r in results if r.get("fit_level") != "too_tight"]
|
||||
sort_fn = SORT_KEYS.get(sort, SORT_KEYS["score"])
|
||||
# vram ascending (smallest first), everything else descending (biggest first)
|
||||
results.sort(key=sort_fn, reverse=(sort != "vram"))
|
||||
# Always sort descending then truncate top-N so each column shows the
|
||||
# global highest by that metric. Before, vram was special-cased
|
||||
# ascending → truncate kept the 50 SMALLEST models and "highest VRAM"
|
||||
# could never appear, breaking the column-click toggle.
|
||||
results.sort(key=sort_fn, reverse=True)
|
||||
results = results[:limit]
|
||||
return results
|
||||
|
||||
@@ -5,7 +5,9 @@ import shutil
|
||||
import subprocess
|
||||
import time
|
||||
|
||||
CACHE_TTL = 1800 # 30 min — hardware rarely changes; use the Rescan button to force a re-probe
|
||||
CACHE_TTL = 24 * 3600 # 24 h — hardware probes are user-initiated via the Rescan button; bumped
|
||||
# from 30 min so changing filters doesn't keep re-probing the rig every
|
||||
# half-hour during a long session.
|
||||
|
||||
|
||||
_remote_host = None # set by detect_system(host=...)
|
||||
|
||||
@@ -13,6 +13,13 @@ QUANT_BPP = {
|
||||
"AWQ-4bit": 0.50, "AWQ-8bit": 1.0,
|
||||
"GPTQ-Int4": 0.50, "GPTQ-Int8": 1.0,
|
||||
"mlx-4bit": 0.55, "mlx-8bit": 1.0, "mlx-6bit": 0.75,
|
||||
# DeepSeek-V4-style mixed: MoE experts in FP4 (bulk), attention + non-
|
||||
# expert dense in FP8, embeddings/LM head in BF16. By weight count the
|
||||
# experts dominate so the effective BPP sits closer to FP4 than FP8.
|
||||
# Empirical: DeepSeek-V4-Flash 284B / 156 GB ≈ 0.55 B/param.
|
||||
"FP4-MoE-Mixed": 0.55,
|
||||
# FP8-Mixed = the *-Base variants (MoE experts also FP8, not FP4).
|
||||
"FP8-Mixed": 1.0,
|
||||
}
|
||||
|
||||
QUANT_SPEED_MULT = {
|
||||
@@ -24,6 +31,8 @@ QUANT_SPEED_MULT = {
|
||||
"AWQ-4bit": 1.2, "AWQ-8bit": 0.85,
|
||||
"GPTQ-Int4": 1.2, "GPTQ-Int8": 0.85,
|
||||
"mlx-4bit": 1.15, "mlx-8bit": 0.85, "mlx-6bit": 1.0,
|
||||
"FP4-MoE-Mixed": 1.10, # slightly slower than pure FP4 because of mixed-dtype dispatch
|
||||
"FP8-Mixed": 0.85,
|
||||
}
|
||||
|
||||
QUANT_QUALITY_PENALTY = {
|
||||
@@ -39,6 +48,11 @@ QUANT_QUALITY_PENALTY = {
|
||||
"AWQ": -1.0, "AWQ-4bit": -4.0, "AWQ-8bit": -1.0,
|
||||
"GPTQ": -1.0, "GPTQ-Int4": -4.0, "GPTQ-Int8": -1.0,
|
||||
"mlx-4bit": -4.0, "mlx-8bit": -0.5, "mlx-6bit": -1.5,
|
||||
# DeepSeek-V4 mixed: only MoE experts at FP4 (the rest is FP8/BF16),
|
||||
# so the realized quality is much closer to FP8 than to pure FP4 —
|
||||
# the activation-sensitive layers stay high-precision. ~0 penalty.
|
||||
"FP4-MoE-Mixed": -0.5,
|
||||
"FP8-Mixed": 0.0,
|
||||
}
|
||||
|
||||
QUANT_BYTES_PER_PARAM = {
|
||||
@@ -50,6 +64,8 @@ QUANT_BYTES_PER_PARAM = {
|
||||
"AWQ-4bit": 0.5, "AWQ-8bit": 1.0,
|
||||
"GPTQ-Int4": 0.5, "GPTQ-Int8": 1.0,
|
||||
"mlx-4bit": 0.5, "mlx-8bit": 1.0, "mlx-6bit": 0.75,
|
||||
"FP4-MoE-Mixed": 0.55,
|
||||
"FP8-Mixed": 1.0,
|
||||
}
|
||||
|
||||
# Pre-quantized formats that should NOT go through the GGUF quant hierarchy.
|
||||
@@ -57,6 +73,7 @@ QUANT_BYTES_PER_PARAM = {
|
||||
PREQUANTIZED_PREFIXES = (
|
||||
"AWQ-", "GPTQ-", "mlx-", "FP8", "FP4", "NVFP4", "MXFP4", "NF4",
|
||||
"INT4", "INT8", "W4A16", "W8A8", "W8A16",
|
||||
"FP4-MoE-Mixed", "FP8-Mixed",
|
||||
)
|
||||
|
||||
|
||||
|
||||
@@ -843,7 +843,7 @@
|
||||
<path d="M3 18a1 1 0 0 1-1-1V4a1 1 0 0 1 1-1h5a4 4 0 0 1 4 4 4 4 0 0 1 4-4h5a1 1 0 0 1 1 1v13a1 1 0 0 1-1 1h-6a3 3 0 0 0-3 3 3 3 0 0 0-3-3z"/>
|
||||
</svg>
|
||||
<span class="grow">Cookbook</span>
|
||||
<span id="cookbook-bg-status" style="display:none;font-size:9px;opacity:0.5;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;margin-left:6px;flex-shrink:1;min-width:0;position:relative;top:-1px;"></span>
|
||||
<span id="cookbook-bg-status" style="display:none;font-size:9px;opacity:0.5;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;margin-right:12px;flex-shrink:1;min-width:0;position:relative;top:-1px;"></span>
|
||||
<span class="cookbook-notif-dot" id="cookbook-notif-dot" style="display:none;margin-left:6px;margin-right:4px;position:relative;top:-1px;left:0px;"></span>
|
||||
</div>
|
||||
<div class="list-item" id="tool-research-btn">
|
||||
|
||||
@@ -23,6 +23,44 @@ import {
|
||||
// browser loads it once. See cookbook-hwfit.js.
|
||||
} from './cookbook.js';
|
||||
import uiModule from './ui.js';
|
||||
|
||||
// Tiny HTML-escape — keeps the file standalone instead of leaning on a
|
||||
// shared helper that may not be exported from this module's import surface.
|
||||
function _diagEsc(s) {
|
||||
return String(s ?? '').replace(/[&<>"']/g, c => ({'&':'&','<':'<','>':'>','"':'"',"'":'''}[c]));
|
||||
}
|
||||
|
||||
// Pick an icon for a diagnosis-action button based on the label. The icon
|
||||
// renders on the LEFT of the button text. Keeps the strokes consistent
|
||||
// across the set so they read as one family.
|
||||
function _diagFixIcon(label) {
|
||||
const l = String(label || '').toLowerCase();
|
||||
const _svg = (path) => `<svg width="11" height="11" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.2" stroke-linecap="round" stroke-linejoin="round" class="cookbook-diag-btn-ico" aria-hidden="true">${path}</svg>`;
|
||||
if (l.startsWith('retry') || l.includes('relaunch') || l.includes('restart')) {
|
||||
// Circular-arrow refresh
|
||||
return _svg('<polyline points="23 4 23 10 17 10"/><polyline points="1 20 1 14 7 14"/><path d="M3.51 9a9 9 0 0 1 14.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0 0 20.49 15"/>');
|
||||
}
|
||||
if (l.startsWith('copy')) {
|
||||
return _svg('<rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/>');
|
||||
}
|
||||
if (l.startsWith('edit')) {
|
||||
return _svg('<path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4Z"/>');
|
||||
}
|
||||
if (l.startsWith('open') || l.includes('dependencies')) {
|
||||
return _svg('<path d="M14 3h7v7"/><path d="M21 3l-9 9"/><path d="M21 14v5a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V5a2 2 0 0 1 2-2h5"/>');
|
||||
}
|
||||
if (l.startsWith('install') || l.includes('upgrade')) {
|
||||
return _svg('<path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/><polyline points="7 10 12 15 17 10"/><line x1="12" y1="15" x2="12" y2="3"/>');
|
||||
}
|
||||
if (l.startsWith('kill') || l.startsWith('stop')) {
|
||||
return _svg('<rect x="6" y="6" width="12" height="12" rx="1"/>');
|
||||
}
|
||||
if (l.startsWith('switch') || l.includes('use ')) {
|
||||
return _svg('<polyline points="17 1 21 5 17 9"/><path d="M3 11V9a4 4 0 0 1 4-4h14"/><polyline points="7 23 3 19 7 15"/><path d="M21 13v2a4 4 0 0 1-4 4H3"/>');
|
||||
}
|
||||
// Default: lightbulb (generic "suggestion")
|
||||
return _svg('<path d="M9 21h6"/><path d="M12 17v4"/><path d="M12 3a6 6 0 0 0-4 10.5c1 1 1.5 2 1.5 3.5h5c0-1.5.5-2.5 1.5-3.5A6 6 0 0 0 12 3Z"/>');
|
||||
}
|
||||
import spinnerModule from './spinner.js';
|
||||
|
||||
// ── Error diagnosis ──
|
||||
@@ -577,7 +615,7 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
|
||||
const btn = document.createElement('button');
|
||||
btn.className = 'cookbook-btn cookbook-diag-btn';
|
||||
btn.type = 'button';
|
||||
btn.textContent = fix.label;
|
||||
btn.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
|
||||
btn.addEventListener('click', (e) => {
|
||||
e.stopPropagation();
|
||||
runFix(fix, btn);
|
||||
@@ -603,7 +641,7 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
|
||||
for (const fix of fixes) {
|
||||
const item = document.createElement('button');
|
||||
item.type = 'button';
|
||||
item.textContent = fix.label;
|
||||
item.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
|
||||
item.addEventListener('click', async (e) => {
|
||||
e.stopPropagation();
|
||||
if (item.dataset.busy || trigger.dataset.busy) return;
|
||||
|
||||
@@ -527,6 +527,9 @@ export async function _hwfitFetch(fresh = false) {
|
||||
if (useCase) params.set('use_case', useCase);
|
||||
if (quantPref) params.set('quant', quantPref);
|
||||
if (targetCtx) params.set('ctx', String(targetCtx));
|
||||
// Fit-only filter — set by the dot in the Fit column header.
|
||||
const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
|
||||
if (_fitOnly) params.set('fit_only', '1');
|
||||
}
|
||||
const endpoint = isImageMode ? `/api/hwfit/image-models?${params}` : `/api/hwfit/models?${params}`;
|
||||
const res = await fetch(endpoint);
|
||||
@@ -888,9 +891,15 @@ export function _hwfitRenderList(el, models) {
|
||||
arrow = isReversed ? ' \u25B2' : ' \u25BC';
|
||||
}
|
||||
const dataAttr = col.key ? ` data-sort="${col.key}"` : '';
|
||||
const label = (col.cls === 'hwfit-fit' && _budget)
|
||||
? `${col.label} <span style="font-size:0.75em;opacity:0.6;font-weight:normal;">(${_budget})</span>`
|
||||
: col.label;
|
||||
// Fit column gets a small dot to its left that toggles "show only models
|
||||
// that fit" — replaces the old Fits On/Off button next to the toolbar.
|
||||
let label = col.label;
|
||||
if (col.cls === 'hwfit-fit') {
|
||||
const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
|
||||
label = `<span class="hwfit-fit-dot${_fitOnly ? ' active' : ''}" title="${_fitOnly ? 'Showing only models that fit. Click to also show too-tight rows.' : 'Click to show only models that fit your hardware.'}" data-fit-dot>●</span>${col.label}`;
|
||||
// (Budget tag removed — the GPU/RAM/N-GPU suffix next to "Fit" was noise;
|
||||
// the toggle row already shows which budget is active.)
|
||||
}
|
||||
html += `<span class="hwfit-col ${col.cls}${sortable}${active}"${dataAttr}>${label}${arrow}</span>`;
|
||||
}
|
||||
html += '</div>';
|
||||
@@ -910,9 +919,31 @@ export function _hwfitRenderList(el, models) {
|
||||
const dlDot = (_cachedModelIds && (_cachedModelIds.has(m.name) || [..._cachedModelIds].some(id => id === m.name?.split('/').pop()))) ? '<span class="hwfit-dl-dot" title="Downloaded">\u25CF</span>' : '';
|
||||
html += `<div class="hwfit-row" data-model="${esc(m.name)}">`;
|
||||
html += `<span class="hwfit-col hwfit-fit" style="color:${fitColor}">${esc(fitLabel)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(m.name?.split('/').pop() || m.name)}${moeBadge}${imgBadge}${dlDot}</span>`;
|
||||
// Append quant to the title when it's not already in the repo name. The
|
||||
// suffix strips quant-parts the name already contains — e.g. for
|
||||
// QuantTrio/MiniMax-M2-AWQ + quant=AWQ-4bit we just show "(4bit)", not
|
||||
// "(AWQ-4bit)". DeepSeek-V4-Flash + FP4-MoE-Mixed keeps the full tag
|
||||
// (none of those parts are in the repo id).
|
||||
const _short = m.name?.split('/').pop() || m.name || '';
|
||||
const _quantTag = (m.quant || '').trim();
|
||||
const _lowerShort = _short.toLowerCase();
|
||||
let _quantSuffix = '';
|
||||
if (_quantTag) {
|
||||
const _parts = _quantTag.split(/[-_]/).filter(Boolean);
|
||||
const _remaining = _parts.filter(p => !_lowerShort.includes(p.toLowerCase()));
|
||||
if (_remaining.length && _remaining.length < _parts.length + 1) { // at least one part is new
|
||||
let _display = _remaining.join('-');
|
||||
if (_display.length > 9) _display = _display.slice(0, 9) + '…';
|
||||
_quantSuffix = ` <span class="hwfit-name-quant" title="${esc(_quantTag)} — full storage format">(${esc(_display)})</span>`;
|
||||
}
|
||||
}
|
||||
html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(_short)}${_quantSuffix}${moeBadge}${imgBadge}${dlDot}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-params">${esc(pcount)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-quant">${esc(m.quant || '?')}</span>`;
|
||||
// Truncate the Quant cell to 9 chars + ellipsis so long tags like
|
||||
// "FP4-MoE-Mixed" don't push neighboring columns. Full tag stays in title.
|
||||
const _qRaw = m.quant || '?';
|
||||
const _qShort = _qRaw.length > 9 ? _qRaw.slice(0, 9) + '…' : _qRaw;
|
||||
html += `<span class="hwfit-col hwfit-c-quant" title="${esc(_qRaw)}">${esc(_qShort)}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-vram">${vramLabel}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-ctx">${m.is_image_gen ? '\u2014' : ctx}</span>`;
|
||||
html += `<span class="hwfit-col hwfit-c-speed">${m.is_image_gen ? '\u2014' : tps + ' t/s'}</span>`;
|
||||
@@ -934,7 +965,26 @@ export function _hwfitRenderList(el, models) {
|
||||
});
|
||||
// Clickable header columns → sort (click again to toggle direction)
|
||||
el.querySelectorAll('.hwfit-header .hwfit-sortable').forEach(col => {
|
||||
col.addEventListener('click', () => {
|
||||
col.addEventListener('click', (e) => {
|
||||
// The little dot inside the Fit header is its own toggle (fit-only
|
||||
// filter), don't let it fall through to a sort click.
|
||||
if (e.target.closest('[data-fit-dot]')) {
|
||||
const on = !e.target.classList.contains('active');
|
||||
try { localStorage.setItem('hwfit_fit_only_v1', on ? '1' : '0'); } catch {}
|
||||
// Un-toggling the fit filter (off → showing too-tight rows again) is
|
||||
// typically because the user wants to see the LARGE models they can't
|
||||
// run yet — re-sort by VRAM descending so the biggest surface first.
|
||||
if (!on) {
|
||||
const sortSel = document.getElementById('hwfit-sort');
|
||||
if (sortSel) {
|
||||
sortSel.value = 'vram';
|
||||
sortSel.dataset.reverse = '0'; // descending (biggest first)
|
||||
}
|
||||
}
|
||||
_hwfitCache = null;
|
||||
_hwfitFetch();
|
||||
return;
|
||||
}
|
||||
const sortKey = col.dataset.sort;
|
||||
if (!sortKey) return;
|
||||
const sel = document.getElementById('hwfit-sort');
|
||||
@@ -1018,7 +1068,16 @@ export function _expandModelRow(row, modelData) {
|
||||
if (modelData.is_image_gen) {
|
||||
html += `<div style="font-size:10px;opacity:0.5;margin-top:4px;">${esc((modelData.capabilities || []).join(' \u00B7 ') || '')}${modelData.description ? ' \u2014 ' + esc(modelData.description) : ''}</div>`;
|
||||
} else if (_requiresAcceleratorBackend(modelData)) {
|
||||
html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
|
||||
// Only show the "needs CUDA/ROCm" note when the host doesn't already have
|
||||
// one. With a visible CUDA/ROCm accelerator the note is noise — the user
|
||||
// can already serve the model and reading the warning on every row makes
|
||||
// the panel feel like everything's broken.
|
||||
const _sys = _hwfitCache?.system || {};
|
||||
const _backend = (_sys.backend || '').toLowerCase();
|
||||
const _hasGpuAccel = !!_sys.has_gpu && (_backend === 'cuda' || _backend === 'rocm');
|
||||
if (!_hasGpuAccel) {
|
||||
html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
|
||||
}
|
||||
}
|
||||
html += `</div>`;
|
||||
|
||||
@@ -1243,14 +1302,14 @@ export function _hwfitInit() {
|
||||
const targetCtx = _ctxValue();
|
||||
try { localStorage.setItem(_CTX_KEY, String(targetCtx)); } catch {}
|
||||
// Ctx drag affects sort mode: a specific ctx target (anything < Max)
|
||||
// implies the user is hunting for "what fits at this context length",
|
||||
// so re-rank by fit (lowest first). Dragging back to Max means no
|
||||
// ctx constraint → go back to the default score-based ranking.
|
||||
// implies "what runs at this context length" — sort by VRAM ascending
|
||||
// so the cheapest-fitting models surface first. Dragging back to Max
|
||||
// releases the constraint → go back to the default score ranking.
|
||||
const sortSel = document.getElementById('hwfit-sort');
|
||||
if (sortSel) {
|
||||
if (targetCtx) {
|
||||
sortSel.value = 'fit';
|
||||
sortSel.dataset.reverse = '1';
|
||||
sortSel.value = 'vram';
|
||||
sortSel.dataset.reverse = '1'; // ascending = smallest VRAM first
|
||||
} else {
|
||||
sortSel.value = 'score';
|
||||
sortSel.dataset.reverse = '';
|
||||
|
||||
@@ -18,6 +18,7 @@ import {
|
||||
_launchServeTask, _serveAutoFix, _serveAutoRetry, _serveAutoRetryReplace, _serveAutoRetryRemove,
|
||||
_startBackgroundMonitor, _syncFromServer,
|
||||
_retryDownload, _nextAvailablePort, _processQueue,
|
||||
_selfHealStaleTasks,
|
||||
} from './cookbookRunning.js';
|
||||
|
||||
import {
|
||||
@@ -641,6 +642,13 @@ async function _fetchDependencies() {
|
||||
const winBlocked = !isLocal && _isWindows() && _winUnsupported.has(pkg.name);
|
||||
const note = pkg.status_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.65;margin-top:3px;">${esc(pkg.status_note)}</div>` : '';
|
||||
const updateNote = pkg.installed && pkg.pip_update_available === false && pkg.update_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.55;margin-top:3px;">${esc(pkg.update_note)}</div>` : '';
|
||||
// Inline "Rebuild" tag for the llama_cpp row only. Styled as a
|
||||
// .cookbook-dep-tag so it matches the LLM category tag's pill look,
|
||||
// and lives to the LEFT of the category tag (clear affordance before
|
||||
// the row "value").
|
||||
const _rebuildBtn = (pkg.name === 'llama_cpp')
|
||||
? `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build).">Rebuild</button>`
|
||||
: '';
|
||||
return `<div class="cookbook-dep-row${winBlocked ? ' cookbook-dep-blocked' : ''}" data-pkg-name="${esc(pkg.name)}" data-dep-pip="${esc(pkg.pip || '')}" data-dep-target="${isLocal ? 'local' : 'remote'}" data-dep-kind="${esc(pkg.kind || 'python')}">`
|
||||
+ `<div class="cookbook-dep-info">`
|
||||
+ `<div class="memory-item-title">${esc(pkg.name)}</div>`
|
||||
@@ -648,6 +656,7 @@ async function _fetchDependencies() {
|
||||
+ note
|
||||
+ updateNote
|
||||
+ `</div>`
|
||||
+ _rebuildBtn
|
||||
+ `<span class="cookbook-dep-tag cookbook-dep-cat">${esc(pkg.category)}</span>`
|
||||
+ _statusTag(pkg, isLocal, isSystemDep, winBlocked)
|
||||
+ `</div>`;
|
||||
@@ -1237,6 +1246,10 @@ function _wireTabEvents(body) {
|
||||
const folded = dlFoldBody.style.display === 'none';
|
||||
dlFoldBody.style.display = folded ? '' : 'none';
|
||||
dlFoldChevron.textContent = folded ? '▾' : '▸';
|
||||
// Toggle is-folded class on the h2 so the line under it only shows when
|
||||
// the section is collapsed (the body's content normally provides
|
||||
// separation; with no body visible, the line gives the h2 definition).
|
||||
dlFold.classList.toggle('is-folded', !folded);
|
||||
try { localStorage.setItem('cookbook_dl_tab_folded_v1', folded ? '0' : '1'); } catch {}
|
||||
});
|
||||
}
|
||||
@@ -1456,7 +1469,7 @@ export function _serverEntryHtml(s, i, defaultServer, forceRemote, isNew) {
|
||||
html += `<span class="cookbook-server-title" style="display:flex;align-items:center;gap:6px;width:100%;font-size:13px;font-weight:600;margin-bottom:4px;">`;
|
||||
html += `${esc(_srvTitle)}`;
|
||||
html += _pIco ? `<span class="cookbook-srv-platform" title="${esc(s.platform || '')}" style="display:inline-flex;align-items:center;opacity:0.55;">${_pIco}</span>` : '';
|
||||
html += `<span class="cookbook-srv-test-msg" style="font-size:10px;font-weight:400;opacity:0.55;max-width:160px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;position:relative;top:2px;"></span>`;
|
||||
html += `<span class="cookbook-srv-test-msg" style="font-size:10px;font-weight:400;opacity:0.55;max-width:160px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;position:relative;top:1px;"></span>`;
|
||||
if (isNew) {
|
||||
// New server: Cancel (discard) sits top-right; the default toggle only makes
|
||||
// sense once the server is saved.
|
||||
@@ -1535,7 +1548,7 @@ function _renderRecipes() {
|
||||
// State persisted to localStorage so the fold survives reloads.
|
||||
const _dlTabFolded = (() => { try { return localStorage.getItem('cookbook_dl_tab_folded_v1') === '1'; } catch { return false; } })();
|
||||
html += '<div style="display:flex;align-items:center;gap:8px;margin-bottom:2px;">';
|
||||
html += `<h2 id="cookbook-dl-tab-fold" style="margin:0;padding:0;line-height:1;cursor:pointer;display:flex;align-items:center;justify-content:space-between;user-select:none;flex:1;">Download<span id="cookbook-dl-tab-chevron" style="display:inline-block;transition:transform 0.15s;font-size:1.1em;margin-left:8px;opacity:0.85;">${_dlTabFolded ? '▸' : '▾'}</span></h2>`;
|
||||
html += `<h2 id="cookbook-dl-tab-fold" class="${_dlTabFolded ? 'is-folded' : ''}" style="margin:0;padding:0;line-height:1;cursor:pointer;display:flex;align-items:center;justify-content:space-between;user-select:none;flex:1;">Download<span id="cookbook-dl-tab-chevron" style="display:inline-block;transition:transform 0.15s;font-size:1.1em;margin-left:8px;opacity:0.85;">${_dlTabFolded ? '▸' : '▾'}</span></h2>`;
|
||||
html += '</div>';
|
||||
html += `<div id="cookbook-dl-tab-fold-body" style="${_dlTabFolded ? 'display:none;' : ''}">`;
|
||||
html += '<p class="memory-desc doclib-desc" style="margin-top:6px;">Download from <a href="https://huggingface.co/models" target="_blank" rel="noopener" style="color:var(--accent,var(--red));text-decoration:none;"><svg width="10" height="10" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;margin-right:1px;"><path d="M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6"/><polyline points="15 3 21 3 21 9"/><line x1="10" y1="14" x2="21" y2="3"/></svg>HuggingFace</a> by pasting model link, or download directly in the Scan section below.</p>';
|
||||
@@ -1605,36 +1618,43 @@ function _renderRecipes() {
|
||||
html += '<p class="memory-desc doclib-desc" style="margin-top:6px;">Scans your hardware for what models you can run. Hardware is cached; hit the scan button to re-probe after changing GPUs.</p>';
|
||||
html += '<div class="hwfit-toolbar" style="margin-top:9px;">';
|
||||
html += '<select class="cookbook-field-input hwfit-usecase" id="hwfit-usecase" style="height:28px;">';
|
||||
html += '<option value="">Type</option><option value="general">General</option><option value="coding">Coding</option>';
|
||||
html += '<option value="general" selected>Standard</option><option value="coding">Coding</option>';
|
||||
html += '<option value="reasoning">Reasoning</option><option value="chat">Chat</option>';
|
||||
// Image tab removed — text→image gen is gone from this build (only inpaint
|
||||
// remains, which uses its own settings panel). Vision (multimodal) stays.
|
||||
html += '<option value="multimodal">Vision</option></select>';
|
||||
html += '<input type="text" class="cookbook-field-input hwfit-search" id="hwfit-search" placeholder="Search models..." style="flex:1;" />';
|
||||
// Quant (Q4/Q8/…) lives next to the search now. Default is "All" so the
|
||||
// list shows the best-scoring quant for every model instead of silently
|
||||
// filtering to Q4 (which used to be the implicit default).
|
||||
html += '<select class="cookbook-field-input hwfit-quant" id="hwfit-quant" style="height:28px;">';
|
||||
html += '<option value="" selected>All</option>';
|
||||
html += '<option value="Q4_K_M">Q4</option><option value="Q8_0">Q8</option>';
|
||||
html += '<option value="Q6_K">Q6</option><option value="Q5_K_M">Q5</option>';
|
||||
html += '<option value="Q3_K_M">Q3</option><option value="Q2_K">Q2</option>';
|
||||
html += '<option value="AWQ-4bit">AWQ</option><option value="FP8">FP8</option><option value="FP4">FP4</option><option value="NVFP4">NVFP4</option></select>';
|
||||
// Engine filter — show only models whose serve engine matches. Composes
|
||||
// with quant / type / search filters.
|
||||
// Engine sits next to the type filter so the "what category / which serving
|
||||
// path" filters live together; Quant + Context are storage-format and budget
|
||||
// levers, grouped to the right.
|
||||
html += '<span class="hwfit-engine-wrap">';
|
||||
html += '<select class="cookbook-field-input hwfit-engine" id="hwfit-engine" style="height:28px;" title="Filter by serving engine">';
|
||||
html += '<option value="">Engine</option>';
|
||||
html += '<option value="llamacpp">llama.cpp</option>';
|
||||
html += '<option value="vllm">vLLM</option>';
|
||||
html += '<option value="sglang">SGLang</option>';
|
||||
html += '</select>';
|
||||
html += '<span class="hwfit-help-chip" title="Higher numbers usually mean better quality, but they need more memory. Lower numbers fit on more hardware.">?</span>';
|
||||
html += '<span class="hwfit-help-chip hwfit-help-chip-inline hwfit-engine-help" title="Rule of thumb: GGUF on single GPU / CPU+RAM → llama.cpp (or Ollama). Safetensors on multi-GPU NVIDIA → vLLM. SGLang is a vLLM-class alternative, sometimes faster on big-MoE / long-context.">?</span>';
|
||||
html += '</span>';
|
||||
// Quant (Q4/Q8/…). Default is "All" so the list shows the best-scoring
|
||||
// quant for every model instead of silently filtering to Q4.
|
||||
html += '<span class="hwfit-quant-wrap">';
|
||||
html += '<select class="cookbook-field-input hwfit-quant" id="hwfit-quant" style="height:28px;">';
|
||||
html += '<option value="" selected>Quant: All</option>';
|
||||
html += '<option value="Q4_K_M">Q4</option><option value="Q8_0">Q8</option>';
|
||||
html += '<option value="Q6_K">Q6</option><option value="Q5_K_M">Q5</option>';
|
||||
html += '<option value="Q3_K_M">Q3</option><option value="Q2_K">Q2</option>';
|
||||
html += '<option value="AWQ-4bit">AWQ</option><option value="FP8">FP8</option><option value="FP4">FP4</option><option value="NVFP4">NVFP4</option></select>';
|
||||
html += '<span class="hwfit-help-chip hwfit-help-chip-inline hwfit-quant-help" title="Lower quant tiers (Q2/Q3/Q4 / AWQ-4bit) are smaller, faster, and cheaper to run, at some quality loss. Higher tiers (Q8 / FP8 / FP16 / BF16) preserve more quality but need more VRAM. “All” shows the best-scoring quant per model — pick a specific one to filter.">?</span>';
|
||||
html += '</span>';
|
||||
// Ctx slider — lets you target a context length for fit estimates; the
|
||||
// hwfit ranking uses _ctxValue() to factor that into VRAM math, so
|
||||
// dragging this re-sorts the list toward models that fit your chosen ctx.
|
||||
html += '<label class="hwfit-ctx-control" title="Context length for fit estimates. Lower it to find more models that could fit your hardware.">';
|
||||
html += '<span>Ctx</span><span class="hwfit-help-chip hwfit-help-chip-inline" title="Context length. Lower it to find more models that could fit your hardware; raise it when you need longer chats or documents.">?</span><input type="range" id="hwfit-context" min="0" max="5" step="1" value="3" />';
|
||||
html += '<span>Context</span><span class="hwfit-help-chip hwfit-help-chip-inline" title="Context length. Lower it to find more models that could fit your hardware; raise it when you need longer chats or documents.">?</span><input type="range" id="hwfit-context" min="0" max="5" step="1" value="3" />';
|
||||
html += '<output id="hwfit-context-label">50k</output></label>';
|
||||
// Search lives at the far right of the toolbar so the controls (Type/Quant/
|
||||
// Engine/Context) read as a row of compact filters followed by free-text.
|
||||
html += '<input type="text" class="cookbook-field-input hwfit-search" id="hwfit-search" placeholder="Search models..." style="flex:1;" />';
|
||||
html += '</div>';
|
||||
html += '<div class="hwfit-toolbar" style="margin-top:7px;">';
|
||||
html += '<select class="cookbook-field-input hwfit-server-select" id="hwfit-server-select" style="height:28px;min-width:88px;position:relative;top:0px;">';
|
||||
@@ -1643,7 +1663,7 @@ function _renderRecipes() {
|
||||
html += '<div class="hwfit-gpu-toggles" id="hwfit-gpu-toggles"></div>';
|
||||
// Scan/refresh button (icon-only) where the quant dropdown used to sit.
|
||||
html += '<button type="button" class="hwfit-gpu-btn" id="hwfit-rescan" title="Re-scan hardware" style="flex-shrink:0;position:relative;top:-3px;left:-1px;">↻ RESCAN</button>';
|
||||
html += '<button type="button" class="hwfit-gpu-btn hwfit-hw-manual-btn" id="hwfit-hw-manual-btn" title="Set hardware manually" style="flex-shrink:0;position:relative;top:-3px;left:-1px;">EDIT</button>';
|
||||
html += '<button type="button" class="hwfit-gpu-btn hwfit-hw-manual-btn" id="hwfit-hw-manual-btn" title="Set hardware manually" style="flex-shrink:0;position:relative;top:-3px;left:-1px;display:inline-flex;align-items:center;gap:3px;"><svg width="10" height="10" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.2" stroke-linecap="round" stroke-linejoin="round" style="flex-shrink:0;"><path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4Z"/></svg>EDIT</button>';
|
||||
// Sort state — the clickable column headers read/write this (pewds' original
|
||||
// sort paradigm). Newest is reachable by clicking the Model column header.
|
||||
html += '<select class="cookbook-field-input hwfit-sort" id="hwfit-sort" style="display:none">';
|
||||
@@ -1663,6 +1683,16 @@ function _renderRecipes() {
|
||||
html += '</div>';
|
||||
html += '<div id="hwfit-hw-row" style="display:none;align-items:center;gap:4px;margin-top:3px;padding-top:2px;"><span style="font-size:10px;padding:2px 8px;border-radius:10px;background:color-mix(in srgb, var(--fg) 8%, transparent);color:var(--fg);opacity:0.7;white-space:nowrap;flex-shrink:0;position:relative;top:-1px;">Detected hardware</span><div class="hwfit-hw" id="hwfit-hw" style="flex:1;"></div></div>';
|
||||
html += '<div class="hwfit-list" id="hwfit-list"></div>';
|
||||
// Footer: link to the public discussion where users can request additions
|
||||
// to the curated model list. Sits below the list so it reads as a callout
|
||||
// after browsing, not a header.
|
||||
html += '<div class="hwfit-list-footer" style="margin-top:8px;padding-top:6px;border-top:1px solid color-mix(in srgb, var(--border) 50%, transparent);font-size:9.5px;opacity:0.65;text-align:right;">'
|
||||
+ 'Don\'t see a model? '
|
||||
+ '<a href="https://github.com/pewdiepie-archdaemon/odysseus/discussions/1962" target="_blank" rel="noopener" style="color:var(--accent,var(--red));text-decoration:none;display:inline-flex;align-items:center;gap:4px;vertical-align:middle;">'
|
||||
+ 'Request it →'
|
||||
+ '<svg width="11" height="11" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" style="flex-shrink:0;"><path d="M8 0C3.58 0 0 3.58 0 8a8 8 0 0 0 5.47 7.59c.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"/></svg>'
|
||||
+ '</a>'
|
||||
+ '</div>';
|
||||
|
||||
html += '</div></div>';
|
||||
|
||||
@@ -1707,7 +1737,8 @@ function _renderRecipes() {
|
||||
html += '<div class="admin-card" style="flex:1;display:flex;flex-direction:column;overflow:hidden;">';
|
||||
html += '<div style="display:flex;align-items:center;gap:8px;margin-bottom:4px;">';
|
||||
html += '<h2 style="margin:0;padding:0;line-height:1;">Dependencies</h2>';
|
||||
html += '<button class="cookbook-field-input" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build)." style="height:24px;font-size:10px;padding:0 8px;cursor:pointer;width:auto;">Rebuild llama.cpp</button>';
|
||||
// Rebuild llama.cpp button moved into the llama_cpp dep row (see _depRow);
|
||||
// having it in the title polluted the section header.
|
||||
html += '<span style="font-size:10px;opacity:0.5;margin-left:auto;">Server</span>';
|
||||
html += '<select class="cookbook-field-input" id="hwfit-deps-server" style="height:28px;min-width:70px;">';
|
||||
html += _buildServerOpts(false);
|
||||
@@ -1746,7 +1777,7 @@ function _renderRecipes() {
|
||||
|
||||
// ── Servers block ───────────────────────────────────────────────────
|
||||
html += '<div class="admin-card" style="flex:0 0 auto;display:flex;flex-direction:column;">';
|
||||
html += '<div style="display:flex;align-items:baseline;gap:8px;margin-bottom:2px;margin-top:-8px;">';
|
||||
html += '<div style="display:flex;align-items:baseline;gap:8px;margin-bottom:2px;margin-top:-4px;">';
|
||||
html += '<h2 style="margin:0;padding:0;line-height:1;">Servers</h2>';
|
||||
// Reuse the calendar +New pill: spinning plus, label fades in idea uses
|
||||
// the same `.cal-add-btn-text` rules, so styling stays consistent.
|
||||
@@ -1893,6 +1924,11 @@ export async function open(opts) {
|
||||
_rendered = true;
|
||||
_clearCookbookNotif();
|
||||
_renderRunningTab();
|
||||
// Self-heal: revive any download tasks whose tmux session is still alive
|
||||
// but were persisted as done/error (covers the "restarted server while a
|
||||
// big multi-shard download was in flight" case — the task survived in
|
||||
// tmux, the cookbook just lost track of it).
|
||||
try { _selfHealStaleTasks({ oneShot: true }); } catch {}
|
||||
if (_content) {
|
||||
// Put the panel in its entering state before it becomes visible. On
|
||||
// mobile, showing first and adding the class a frame later can paint the
|
||||
|
||||
@@ -535,6 +535,42 @@ export async function _runModelDownload(panel, model, backend, hostOverride) {
|
||||
uiModule.showToast(`${shortName} is already ${duplicate.status === 'queued' ? 'queued' : 'downloading'}`);
|
||||
return;
|
||||
}
|
||||
// Also catch zombie "done" tasks — the cookbook may have lost track of a
|
||||
// download (server restart, stale state) while its tmux session is still
|
||||
// alive on the host. Probe it; if alive, flip back to running + treat as
|
||||
// duplicate so we don't kick off a second concurrent download writing to
|
||||
// the same target dir.
|
||||
const zombieCandidate = tasks.find(t => sameDownload(t)
|
||||
&& ['done', 'error', 'crashed', 'stopped'].includes(t.status)
|
||||
&& t.sessionId && !String(t.sessionId).startsWith('queue-'));
|
||||
if (zombieCandidate) {
|
||||
try {
|
||||
const _zh = zombieCandidate.remoteHost || '';
|
||||
const _zPort = (_envState.servers || []).find(s => s.host === _zh)?.port;
|
||||
const _sshPf = _zh ? `ssh ${_zPort && _zPort !== '22' ? `-p ${_zPort} ` : ''}${_zh} '` : '';
|
||||
const _sshSf = _zh ? `'` : '';
|
||||
const _probeCmd = `${_sshPf}tmux has-session -t ${zombieCandidate.sessionId} 2>/dev/null${_sshSf}`;
|
||||
const _r = await fetch('/api/shell/exec', {
|
||||
method: 'POST', credentials: 'same-origin',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ command: _probeCmd, timeout: 5 }),
|
||||
});
|
||||
const _d = await _r.json();
|
||||
if (_d.exit_code === 0) {
|
||||
// tmux still alive → not actually done. Revive + tell the user.
|
||||
const _fresh = _loadTasks();
|
||||
const _ft = _fresh.find(t => t.sessionId === zombieCandidate.sessionId);
|
||||
if (_ft) {
|
||||
_ft.status = 'running';
|
||||
_ft._selfHealed = true;
|
||||
_saveTasks(_fresh);
|
||||
}
|
||||
_renderRunningTab();
|
||||
uiModule.showToast(`${shortName} is still downloading (was marked finished after a restart — revived)`);
|
||||
return;
|
||||
}
|
||||
} catch { /* probe failed — fall through and let the user launch */ }
|
||||
}
|
||||
const activeOnHost = tasks.find(t => t.type === 'download' && (t.status === 'running' || t.status === 'queued') && (t.remoteHost || 'local') === targetHost);
|
||||
|
||||
if (activeOnHost) {
|
||||
|
||||
@@ -35,13 +35,34 @@ function _taskBadge(task) {
|
||||
return { text: _statusLabel(task.status, task.type), cls: 'cookbook-task-' + task.status };
|
||||
}
|
||||
|
||||
// A download task whose tmux output still shows an active per-shard line
|
||||
// (e.g. "model-00012-of-00082.safetensors: 56%|") is NOT actually finished —
|
||||
// the cookbook just lost track. The clear pill becomes a "reconnect" affordance
|
||||
// in that case (click → revive the row + reattach the poll loop).
|
||||
function _downloadOutputLooksActive(task) {
|
||||
if (!task || task.type !== 'download') return false;
|
||||
const out = task.output || '';
|
||||
if (!out) return false;
|
||||
if (out.includes('DOWNLOAD_OK') || out.includes('DOWNLOAD_FAILED')) return false;
|
||||
// An active shard line: filename + a colon + a percentage that isn't 100%.
|
||||
// We catch any in-flight shard or "Downloading 'X' to ..." line (no %).
|
||||
return /model-\d+-of-\d+\.[a-z]+:\s+(?!100%)\d+%/i.test(out)
|
||||
|| /Downloading\s+'[^']+'\s+to\s+'[^']*\.incomplete'/i.test(out);
|
||||
}
|
||||
|
||||
function _canClearTask(task) {
|
||||
if (!task || task.status === 'running') return false;
|
||||
if (task.type === 'serve' && (task.status === 'ready' || task._serveReady)) return false;
|
||||
// If the tmux output still shows an in-flight download, the task isn't
|
||||
// actually finished — hide the clear/check pill so it doesn't show on a
|
||||
// task that's still doing work. (The next render will reflect this and
|
||||
// ideally the self-heal flips status back to running.)
|
||||
if (_downloadOutputLooksActive(task)) return false;
|
||||
return ['done', 'stopped', 'error', 'crashed', 'failed'].includes(task.status);
|
||||
}
|
||||
|
||||
function _clearPillLabel(task) {
|
||||
if (_downloadOutputLooksActive(task)) return 'reconnect';
|
||||
return 'clear';
|
||||
}
|
||||
|
||||
@@ -1537,7 +1558,16 @@ export function _renderRunningTab() {
|
||||
|
||||
const tasks = _loadTasks();
|
||||
const hasContent = tasks.length > 0;
|
||||
const activeCount = tasks.filter(t => t.status === 'running' || t.status === 'queued').length;
|
||||
// Count anything that's really active: explicit 'running'/'queued' status,
|
||||
// OR a download whose tmux output is still showing live shard progress.
|
||||
// Without the output check, a task whose status got stuck at 'done' /
|
||||
// 'crashed' (before auto-reconnect catches it) would read as "Running 0"
|
||||
// even when the model is actively downloading on the host.
|
||||
const activeCount = tasks.filter(t =>
|
||||
t.status === 'running'
|
||||
|| t.status === 'queued'
|
||||
|| _downloadOutputLooksActive(t)
|
||||
).length;
|
||||
const activeCountHtml = activeCount ? ` <span class="cookbook-tab-count">${activeCount}</span>` : '';
|
||||
|
||||
let tabBar = body.querySelector('.cookbook-tabs');
|
||||
@@ -1824,9 +1854,31 @@ export function _renderRunningTab() {
|
||||
const h = Math.floor(secs / 3600);
|
||||
const m = Math.floor((secs % 3600) / 60);
|
||||
const s = secs % 60;
|
||||
_uptimeEl.textContent = h > 0
|
||||
const _timer = h > 0
|
||||
? `${_prefix}: ${h}h ${String(m).padStart(2,'0')}m`
|
||||
: `${_prefix}: ${m}m ${String(s).padStart(2,'0')}s`;
|
||||
// ETA — only for downloads, only when we have a meaningful overall %.
|
||||
// Reads the badge text (which already shows the true overall % we
|
||||
// compute in the live-polling block) and back-derives a remaining-time
|
||||
// estimate from elapsed/done. Hidden until pct >= 3% so the early-job
|
||||
// wild estimates don't show.
|
||||
let _eta = '';
|
||||
if (task.type === 'download') {
|
||||
const _badge = el.querySelector('.cookbook-task-status');
|
||||
const _m = _badge && /^(\d+)%/.exec(_badge.textContent || '');
|
||||
const _pct = _m ? parseInt(_m[1], 10) : 0;
|
||||
if (_pct >= 3 && _pct < 100 && secs > 5) {
|
||||
const _totalSec = Math.round(secs * (100 / _pct));
|
||||
const _remain = Math.max(0, _totalSec - secs);
|
||||
const _eh = Math.floor(_remain / 3600);
|
||||
const _em = Math.floor((_remain % 3600) / 60);
|
||||
const _es = _remain % 60;
|
||||
_eta = _eh > 0
|
||||
? ` · ETA ${_eh}h ${String(_em).padStart(2,'0')}m`
|
||||
: (_em > 0 ? ` · ETA ${_em}m ${String(_es).padStart(2,'0')}s` : ` · ETA ${_es}s`);
|
||||
}
|
||||
}
|
||||
_uptimeEl.textContent = _timer + _eta;
|
||||
}, 1000);
|
||||
}
|
||||
|
||||
@@ -1874,11 +1926,32 @@ export function _renderRunningTab() {
|
||||
if (_clearChk) {
|
||||
_clearChk.addEventListener('click', (e) => {
|
||||
e.stopPropagation();
|
||||
// Belt-and-suspenders: kill the tmux session too. For a real-finished
|
||||
// task the session is already gone and kill-session errors silently,
|
||||
// but for a task that was falsely flagged done (the strict-finish
|
||||
// bug), this guarantees the still-running download actually stops
|
||||
// rather than continuing to write to disk after the row is removed.
|
||||
// If the output still shows an active shard line, the task isn't
|
||||
// actually finished — clicking is "reconnect" (flip back to running
|
||||
// + let _reconnectTask reattach to the live tmux session), not
|
||||
// "clear". The pill label already reflects this via _clearPillLabel.
|
||||
if (_downloadOutputLooksActive(task)) {
|
||||
const _fresh = _loadTasks();
|
||||
const _ft = _fresh.find(t => t.sessionId === task.sessionId);
|
||||
if (_ft) {
|
||||
_ft.status = 'running';
|
||||
_ft._selfHealed = true;
|
||||
_saveTasks(_fresh);
|
||||
}
|
||||
// Visually flip without waiting for a full re-render — same path the
|
||||
// self-heal uses on cookbook open.
|
||||
const _chk = el.querySelector('.cookbook-task-check');
|
||||
if (_chk) _chk.style.display = 'none';
|
||||
const _wave = el.querySelector('.cookbook-task-wave');
|
||||
if (_wave) _wave.style.display = '';
|
||||
const _up = el.querySelector('.cookbook-task-uptime');
|
||||
if (_up) _up.style.display = '';
|
||||
el.dataset.status = 'running';
|
||||
_renderRunningTab();
|
||||
return;
|
||||
}
|
||||
// Otherwise: real clear. Kill the tmux session as belt-and-suspenders,
|
||||
// then animate out + remove the row.
|
||||
try {
|
||||
fetch('/api/shell/exec', {
|
||||
method: 'POST', credentials: 'same-origin',
|
||||
@@ -2964,9 +3037,84 @@ function _refreshServerDots() {
|
||||
_syncSettingsServerDots(byKey);
|
||||
}
|
||||
|
||||
// Self-heal: scan persisted download tasks marked done/error/crashed and
|
||||
// check whether their tmux session is still alive on the host. If yes —
|
||||
// the task isn't actually finished, the cookbook just lost the in-flight
|
||||
// status during restart — flip status back to 'running' so _reconnectTask
|
||||
// picks it up. The one-shot guard is enforced by callers (open path) or
|
||||
// time-throttled inside (background-monitor path).
|
||||
let _selfHealRan = false;
|
||||
let _selfHealLastTs = 0;
|
||||
export async function _selfHealStaleTasks(opts = {}) {
|
||||
// Open-path call: one-shot per page load.
|
||||
if (opts.oneShot) {
|
||||
if (_selfHealRan) return;
|
||||
_selfHealRan = true;
|
||||
} else {
|
||||
// Background-monitor call: throttle to once every 8s (the bg monitor
|
||||
// itself fires every 10s, so this almost always fires too, but the
|
||||
// guard keeps a fast manual call from doubling up).
|
||||
const now = Date.now();
|
||||
if (now - _selfHealLastTs < 8000) return;
|
||||
_selfHealLastTs = now;
|
||||
}
|
||||
const tasks = _loadTasks();
|
||||
const candidates = tasks.filter(t =>
|
||||
t.type === 'download'
|
||||
&& ['done', 'error', 'crashed', 'stopped'].includes(t.status)
|
||||
&& t.sessionId
|
||||
&& !String(t.sessionId).startsWith('queue-')
|
||||
);
|
||||
if (!candidates.length) return;
|
||||
let flipped = 0;
|
||||
for (const t of candidates) {
|
||||
try {
|
||||
const res = await fetch('/api/shell/exec', {
|
||||
method: 'POST', credentials: 'same-origin',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ command: _tmuxCmd(t, `has-session -t ${t.sessionId}`), timeout: 5 }),
|
||||
});
|
||||
const data = await res.json();
|
||||
if (data.exit_code === 0) {
|
||||
// Session still alive → the task is actually still running.
|
||||
const fresh = _loadTasks();
|
||||
const ft = fresh.find(x => x.sessionId === t.sessionId);
|
||||
if (ft && ft.status !== 'running') {
|
||||
ft.status = 'running';
|
||||
ft._selfHealed = true;
|
||||
_saveTasks(fresh);
|
||||
flipped++;
|
||||
const _el = document.querySelector(`.cookbook-task[data-task-id="${t.sessionId}"]`);
|
||||
if (_el) {
|
||||
const _chk = _el.querySelector('.cookbook-task-check');
|
||||
if (_chk) _chk.style.display = 'none';
|
||||
const _wave = _el.querySelector('.cookbook-task-wave');
|
||||
if (_wave) _wave.style.display = '';
|
||||
const _up = _el.querySelector('.cookbook-task-uptime');
|
||||
if (_up) _up.style.display = '';
|
||||
_el.dataset.status = 'running';
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch { /* network blip — skip this one */ }
|
||||
}
|
||||
if (flipped) {
|
||||
console.log(`[cookbook] auto-reconnect: revived ${flipped} task(s) whose tmux session was still alive`);
|
||||
_renderRunningTab();
|
||||
}
|
||||
}
|
||||
|
||||
export function _startBackgroundMonitor() {
|
||||
if (_bgMonitorInterval) return;
|
||||
_bgMonitorInterval = setInterval(() => { _pollBackgroundStatus(); _checkServeReachability(); }, BG_MONITOR_INTERVAL_MS);
|
||||
_bgMonitorInterval = setInterval(() => {
|
||||
_pollBackgroundStatus();
|
||||
_checkServeReachability();
|
||||
// Auto-reconnect: every cycle, look for download tasks marked finished/
|
||||
// crashed/etc. whose tmux session is actually still running, and flip
|
||||
// them back to running. Internally throttled to 8s so a manual call from
|
||||
// the open path or a fast invocation doesn't double up.
|
||||
_selfHealStaleTasks().catch(() => {});
|
||||
}, BG_MONITOR_INTERVAL_MS);
|
||||
_pollBackgroundStatus();
|
||||
_checkServeReachability();
|
||||
}
|
||||
|
||||
@@ -560,6 +560,15 @@ function _rerenderCachedModels() {
|
||||
+ `</div>`;
|
||||
|
||||
let panelHtml = `<div class="hwfit-serve-panel">${_slotsHtml}`;
|
||||
// Warn when serving a model whose download hasn't fully completed —
|
||||
// the user CAN still hit Launch (vLLM/llama-server will start, then
|
||||
// crash trying to read missing shards), but they should know.
|
||||
if (m && (m.status === 'downloading' || m.status === 'stalled' || m.has_incomplete)) {
|
||||
const _warnText = m.status === 'stalled'
|
||||
? `This model looks like a stale download shell (${esc(m.size || '0 KB')}). The weights aren't on disk — the serve will fail to load. Re-download first, or pick another model.`
|
||||
: `This model's download isn't complete yet (${esc(m.size || 'partial')}). The serve will start but is likely to crash on a missing shard. Wait for the download to finish, or relaunch after it's done.`;
|
||||
panelHtml += `<div class="hwfit-serve-warn" style="margin:0 0 8px;padding:6px 10px;border-radius:5px;font-size:11px;background:color-mix(in srgb, var(--color-warning, #f0ad4e) 14%, transparent);border:1px solid color-mix(in srgb, var(--color-warning, #f0ad4e) 40%, transparent);color:var(--color-warning, #f0ad4e);display:flex;gap:6px;align-items:flex-start;line-height:1.4;"><span aria-hidden="true">⚠</span><span>${_warnText}</span></div>`;
|
||||
}
|
||||
// Row 1: Backend + Server + Env
|
||||
panelHtml += `<div class="hwfit-serve-row">`;
|
||||
const _backendChoices = _isWindows()
|
||||
@@ -597,13 +606,13 @@ function _rerenderCachedModels() {
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('TP','Tensor Parallelism — split model across N GPUs')}<select class="hwfit-sf" data-field="tp">${tpOpts}</select></label>`;
|
||||
// ctx resets to the model's max on every panel open (the real ctx slider
|
||||
// lives in the Scan/Download toolbar — see cookbook.js .hwfit-ctx-control).
|
||||
panelHtml += `<label>${_l('Context','Max tokens per request — resets to the model max on every open. Lower = less VRAM')}<input type="text" class="hwfit-sf" data-field="ctx" value="${esc(m.context_length || m.context || '8192')}" /></label>`;
|
||||
panelHtml += `<label>${_l('Context','Max tokens per request — resets to the model max on every open. Lower = less VRAM')}<input type="text" class="hwfit-sf" data-field="ctx" value="${esc(m.context_length || m.context || '20000')}" /></label>`;
|
||||
panelHtml += `<label>${_l('GPU','Which GPU to use. Leave empty for default')}<input type="text" class="hwfit-sf" data-field="gpu_id" value="${esc(sv('gpu_id', ''))}" placeholder="auto" style="width:50px;" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('GPU Mem','Fraction of GPU memory (0.0–1.0). Lower if OOM')}<input type="text" class="hwfit-sf" data-field="gpu_mem" value="${esc(sv('gpu_mem', '0.90'))}" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm">${_l('Swap','CPU swap space in GB. Leave empty to omit (removed in newer vLLM)')}<input type="text" class="hwfit-sf" data-field="swap" value="${esc(sv('swap', ''))}" placeholder="off" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 8 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '8'))}" placeholder="8" /></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 4 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '4'))}" placeholder="4" /></label>`;
|
||||
panelHtml += `<label>${_l('Dtype','Data type for weights. auto picks best for GPU')}<select class="hwfit-sf" data-field="dtype">${dtypeOpts}</select></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype">${vllmKvCacheOpts}</select></label>`;
|
||||
panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype" style="height:32px;">${vllmKvCacheOpts}</select></label>`;
|
||||
panelHtml += `</div>`;
|
||||
// Row 2b: Diffusers settings
|
||||
const diffDtypeOpts = ['bfloat16','float16','float32'].map(d => `<option value="${d}"${sv('diff_dtype','bfloat16')===d?' selected':''}>${d}</option>`).join('');
|
||||
@@ -696,7 +705,7 @@ function _rerenderCachedModels() {
|
||||
if (!_specMethods.includes(_specMethod)) _specMethods.unshift(_specMethod);
|
||||
const _specOpts = _specMethods.map(m =>
|
||||
`<option value="${m}"${m === _specMethod ? ' selected' : ''}>${m}</option>`).join('');
|
||||
panelHtml += `<label class="hwfit-sf-cb hwfit-spec-group"><input type="checkbox" class="hwfit-sf" data-field="speculative" /> Speculative <select class="hwfit-sf hwfit-spec-method" data-field="spec_method" title="vLLM --speculative-config method">${_specOpts}</select><span class="hwfit-numstep"><button type="button" class="hwfit-numstep-btn" data-step="-1" tabindex="-1" aria-label="Decrease">‹</button><input type="number" class="hwfit-sf hwfit-spec-tokens" data-field="spec_tokens" value="${esc(_specTokens)}" min="1" max="10" title="num_speculative_tokens" /><button type="button" class="hwfit-numstep-btn" data-step="1" tabindex="-1" aria-label="Increase">›</button></span></label>`;
|
||||
panelHtml += `<label class="hwfit-sf-cb hwfit-spec-group"><input type="checkbox" class="hwfit-sf" data-field="speculative" /> Speculative <select class="hwfit-sf hwfit-spec-method" data-field="spec_method" title="vLLM --speculative-config method">${_specOpts}</select><span class="hwfit-numstep"><button type="button" class="hwfit-numstep-btn" data-step="-1" tabindex="-1" aria-label="Decrease">‹</button><input type="number" class="hwfit-sf hwfit-spec-tokens" data-field="spec_tokens" value="${esc(_specTokens)}" min="1" max="10" title="num_speculative_tokens" /><button type="button" class="hwfit-numstep-btn" data-step="1" tabindex="-1" aria-label="Increase">›</button></span><span class="hwfit-help-chip hwfit-help-chip-inline" title="MTP / speculative decoding is supported on a few model families only — turn it on when the model card explicitly recommends it. On supported models it can boost inference throughput up to ~3×; on unsupported models it will either be ignored or fail to launch." style="margin-left:6px;">?</span></label>`;
|
||||
}
|
||||
if (_opts2.envVars.length) panelHtml += `<label class="hwfit-sf-cb"><input type="checkbox" class="hwfit-sf" data-field="moe_env" /> MoE Env Vars</label>`;
|
||||
panelHtml += `</div>`;
|
||||
@@ -721,7 +730,7 @@ function _rerenderCachedModels() {
|
||||
// pushes Cancel + Launch to the right.
|
||||
panelHtml += `<span class="hwfit-serve-actions-spacer"></span>`;
|
||||
panelHtml += `<button class="cookbook-btn hwfit-serve-cancel" type="button" title="Close this configuration panel">Cancel</button>`;
|
||||
panelHtml += `<button class="cookbook-btn hwfit-serve-launch">Launch</button>`;
|
||||
panelHtml += `<button class="cookbook-btn hwfit-serve-launch"><svg width="11" height="11" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;margin-right:4px;flex-shrink:0;"><polygon points="13 2 3 14 12 14 11 22 21 10 12 10 13 2"/></svg>Launch</button>`;
|
||||
panelHtml += `</div>`;
|
||||
panelHtml += `</div>`;
|
||||
|
||||
@@ -1657,6 +1666,37 @@ function _rerenderCachedModels() {
|
||||
});
|
||||
return;
|
||||
}
|
||||
// Pre-launch GPU probe — common failure pattern: vLLM/SGLang launched
|
||||
// on a host where no GPU is visible (driver missing, $CUDA_VISIBLE_DEVICES
|
||||
// unset, container without --gpus). Catch it BEFORE the user spends
|
||||
// minutes watching the task fail.
|
||||
const _needsGpu = ['vllm', 'sglang'].includes(serveState.backend)
|
||||
|| (serveState.backend === 'diffusers');
|
||||
if (_needsGpu) {
|
||||
try {
|
||||
const _probeHost = (_envState.remoteHost || '').trim();
|
||||
const _probeParams = new URLSearchParams();
|
||||
if (_probeHost) {
|
||||
_probeParams.set('host', _probeHost);
|
||||
const _sp = (_envState.servers || []).find(s => s.host === _probeHost)?.port;
|
||||
if (_sp) _probeParams.set('ssh_port', _sp);
|
||||
}
|
||||
const _probeRes = await fetch('/api/cookbook/gpus' + (_probeParams.toString() ? '?' + _probeParams : ''), { credentials: 'same-origin' });
|
||||
const _probeData = await _probeRes.json();
|
||||
const _probeGpus = Array.isArray(_probeData) ? _probeData : (_probeData.gpus || []);
|
||||
if (!_probeGpus.length) {
|
||||
const _proceed = await window.styledConfirm(
|
||||
`No GPU detected on ${_probeHost ? _probeHost : 'this host'}. ${serveState.backend.toUpperCase()} needs a visible CUDA/ROCm accelerator to start — launching now will most likely crash early.\n\nLaunch anyway?`,
|
||||
{ title: 'No GPU detected', confirmText: 'Launch anyway', cancelText: 'Cancel', danger: true },
|
||||
);
|
||||
if (!_proceed) return;
|
||||
}
|
||||
} catch {
|
||||
// Network / probe failure — don't block. Better to let the launch
|
||||
// proceed than to silently refuse because the probe endpoint
|
||||
// hiccuped (the user can read the real error in the task output).
|
||||
}
|
||||
}
|
||||
// Save in the { _byRepo, _lastUsed } schema — no legacy flat keys at
|
||||
// the root so per-model state doesn't leak between models.
|
||||
try {
|
||||
|
||||
160
static/style.css
160
static/style.css
@@ -18628,16 +18628,41 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
background: color-mix(in srgb, var(--fg) 10%, transparent);
|
||||
color: color-mix(in srgb, var(--fg) 60%, transparent);
|
||||
}
|
||||
/* Rebuild tag — same look as the LLM category tag, sits to its left. */
|
||||
.cookbook-dep-rebuild {
|
||||
background: color-mix(in srgb, var(--fg) 10%, transparent);
|
||||
color: color-mix(in srgb, var(--fg) 75%, transparent);
|
||||
border: 1px solid color-mix(in srgb, var(--fg) 20%, transparent);
|
||||
cursor: pointer;
|
||||
font-family: inherit;
|
||||
appearance: none;
|
||||
-webkit-appearance: none;
|
||||
-moz-appearance: none;
|
||||
}
|
||||
.cookbook-dep-rebuild:hover {
|
||||
background: color-mix(in srgb, var(--accent, var(--red)) 18%, transparent);
|
||||
color: var(--accent, var(--red));
|
||||
border-color: color-mix(in srgb, var(--accent, var(--red)) 45%, transparent);
|
||||
}
|
||||
.cookbook-dep-installed {
|
||||
background: color-mix(in srgb, var(--green, #50fa7b) 18%, transparent);
|
||||
color: var(--green, #50fa7b);
|
||||
border: 1px solid color-mix(in srgb, var(--green, #50fa7b) 35%, transparent);
|
||||
/* Match the Install button + Installed ▾ split width so all three variants
|
||||
align in a mixed row. */
|
||||
min-width: 75.85px;
|
||||
padding: 0 10px;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.cookbook-dep-na {
|
||||
background: color-mix(in srgb, var(--fg) 8%, transparent);
|
||||
color: color-mix(in srgb, var(--fg) 60%, transparent);
|
||||
border: 1px solid color-mix(in srgb, var(--fg) 16%, transparent);
|
||||
cursor: help;
|
||||
/* Match other dep tag widths so N/A rows line up with Install / Installed. */
|
||||
min-width: 75.85px;
|
||||
padding: 0 10px;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.cookbook-dep-install {
|
||||
background: var(--accent, var(--red));
|
||||
@@ -18648,12 +18673,30 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
font-weight: 500;
|
||||
position: relative;
|
||||
top: -3px;
|
||||
/* Width matches the measured Installed ▾ split button (75.85px) so a row of
|
||||
mixed Install / Installed deps lines up. */
|
||||
min-width: 75.85px;
|
||||
padding: 0 10px;
|
||||
/* Strip the native button box so it's the same height as the sibling tags
|
||||
(Firefox renders <button> taller otherwise); height comes from .cookbook-dep-tag. */
|
||||
appearance: none;
|
||||
-webkit-appearance: none;
|
||||
-moz-appearance: none;
|
||||
}
|
||||
/* Conditional line under the Download h2: only when the section is folded
|
||||
(collapsed). When expanded, the body content provides separation; the
|
||||
underline reads as clutter. */
|
||||
#cookbook-dl-tab-fold { border-bottom: none !important; padding-bottom: 0 !important; }
|
||||
#cookbook-dl-tab-fold.is-folded {
|
||||
border-bottom: 1px solid color-mix(in srgb, var(--border) 40%, transparent) !important;
|
||||
padding-bottom: 6px !important;
|
||||
}
|
||||
/* Center the "?" glyph inside the help chip. Without text-align it sits 0.5px
|
||||
left of true center because of the character's natural baseline offset. */
|
||||
.hwfit-help-chip {
|
||||
text-align: center;
|
||||
padding-left: 0.5px;
|
||||
}
|
||||
.cookbook-dep-install:hover { opacity: 0.85; }
|
||||
/* Installed split button: "Installed" label + separator + ▾ caret; clicking it
|
||||
opens the actions menu (Update). Replaces the old ⋮ button. */
|
||||
@@ -18709,12 +18752,13 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 4px;
|
||||
background: var(--bg);
|
||||
font-size: 11px;
|
||||
font-size: 12px; /* match .cookbook-field-input so Context reads same size as Engine/Quant */
|
||||
}
|
||||
.hwfit-ctx-control span {
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.3px;
|
||||
opacity: 0.75;
|
||||
/* Match Quant/Engine select label style: no uppercase, no letter-spacing. */
|
||||
text-transform: none;
|
||||
letter-spacing: 0;
|
||||
opacity: 0.9;
|
||||
}
|
||||
/* Editor-style slider (same look as the gallery editor sliders): thin pill
|
||||
rail that fattens on interaction, circular red thumb that grows on hover. */
|
||||
@@ -18726,11 +18770,19 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
border: 0;
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
background: color-mix(in srgb, var(--fg) 25%, transparent);
|
||||
/* Hard-coded grey so the rail is GUARANTEED visible regardless of theme —
|
||||
every theme-derived color we tried (--fg-muted, --border, accent-bg mix)
|
||||
kept blending into the panel background on at least one theme. */
|
||||
background: rgba(150, 150, 150, 0.65);
|
||||
border-radius: 999px;
|
||||
accent-color: var(--red);
|
||||
cursor: pointer;
|
||||
transition: height 0.15s ease;
|
||||
transition: height 0.15s ease, background 0.15s ease;
|
||||
}
|
||||
.hwfit-ctx-control input[type="range"]:hover,
|
||||
.hwfit-ctx-control input[type="range"]:focus,
|
||||
.hwfit-ctx-control input[type="range"]:active {
|
||||
background: var(--fg);
|
||||
}
|
||||
.hwfit-ctx-control input[type="range"]:hover,
|
||||
.hwfit-ctx-control input[type="range"]:focus,
|
||||
@@ -19324,9 +19376,12 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
position: relative;
|
||||
top: -4px;
|
||||
cursor: pointer;
|
||||
padding: 1px 6px 1px 4px;
|
||||
/* Tightened vertical padding so the hover-background isn't disproportionately
|
||||
tall vs the icon+label. */
|
||||
padding: 0 6px 0 4px;
|
||||
height: 14px;
|
||||
border: 0;
|
||||
border-radius: 9px;
|
||||
border-radius: 7px;
|
||||
background: transparent;
|
||||
color: var(--fg);
|
||||
font-family: inherit;
|
||||
@@ -20028,6 +20083,17 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
border-color: var(--color-error);
|
||||
background: color-mix(in srgb, var(--color-error) 12%, transparent);
|
||||
}
|
||||
/* Icons on the left of diagnosis action buttons (Retry / Copy / Edit / etc.). */
|
||||
.cookbook-diag-btn,
|
||||
.cookbook-diag-menu button {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 5px;
|
||||
}
|
||||
.cookbook-diag-btn-ico {
|
||||
flex-shrink: 0;
|
||||
opacity: 0.9;
|
||||
}
|
||||
|
||||
/* ── What Fits? (hardware model fitting tab in cookbook) ── */
|
||||
.cookbook-group.hidden { display: none !important; }
|
||||
@@ -20500,6 +20566,40 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
.hwfit-toolbar .hwfit-usecase { min-width: 70px; flex-shrink: 0; }
|
||||
.hwfit-toolbar .hwfit-quant { min-width: 50px; flex-shrink: 0; }
|
||||
.hwfit-toolbar .hwfit-search { flex: 1; min-width: 80px; }
|
||||
/* Lower-opacity "Search models..." placeholder so it reads as a hint, not
|
||||
a label — matches the muted form-field feel of the inline filters. */
|
||||
.hwfit-search::placeholder { opacity: 0.5; }
|
||||
.hwfit-search::-webkit-input-placeholder { opacity: 0.5; }
|
||||
.hwfit-search::-moz-placeholder { opacity: 0.5; }
|
||||
|
||||
/* Dot inside the Fit column header — click to toggle the fit-only filter
|
||||
(off = show too-tight rows; on = hide them). */
|
||||
.hwfit-fit-dot {
|
||||
display: inline-block;
|
||||
margin-right: 4px;
|
||||
font-size: 8px;
|
||||
line-height: 1;
|
||||
color: color-mix(in srgb, var(--fg) 35%, transparent);
|
||||
cursor: pointer;
|
||||
vertical-align: middle;
|
||||
position: relative;
|
||||
top: -1px; /* nudge 1px up so the small dot sits centered with the "Fit" caps */
|
||||
transition: color 0.12s ease, text-shadow 0.12s ease;
|
||||
}
|
||||
/* Quant suffix appended to model names when the storage format isn't in the
|
||||
repo id — e.g. "(FP4-MoE-Mixed)" after DeepSeek-V4-Flash. Muted to read as
|
||||
metadata, not part of the name. */
|
||||
.hwfit-name-quant {
|
||||
font-size: 0.78em;
|
||||
opacity: 0.55;
|
||||
font-weight: 400;
|
||||
margin-left: 4px;
|
||||
}
|
||||
.hwfit-fit-dot:hover { color: var(--accent, var(--red)); }
|
||||
.hwfit-fit-dot.active {
|
||||
color: var(--green, #50fa7b);
|
||||
text-shadow: 0 0 4px color-mix(in srgb, var(--green, #50fa7b) 55%, transparent);
|
||||
}
|
||||
.hwfit-help-chip {
|
||||
width: 14px;
|
||||
height: 14px;
|
||||
@@ -20526,6 +20626,28 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
.hwfit-help-chip-inline {
|
||||
margin-left: -2px;
|
||||
margin-right: 0;
|
||||
top: 0; /* parent rule sets top:-1px; nudge inline variant 1px lower */
|
||||
}
|
||||
/* Quant select + inline ? wrapper — the ? sits inside the dropdown's bordered
|
||||
box, anchored on the right just left of the chevron. */
|
||||
.hwfit-quant-wrap, .hwfit-engine-wrap {
|
||||
position: relative;
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
}
|
||||
.hwfit-quant-wrap .hwfit-quant,
|
||||
.hwfit-engine-wrap .hwfit-engine {
|
||||
/* Make room for the ? on the right edge, in addition to the native chevron. */
|
||||
padding-right: 32px;
|
||||
}
|
||||
.hwfit-quant-wrap .hwfit-quant-help,
|
||||
.hwfit-engine-wrap .hwfit-engine-help {
|
||||
position: absolute;
|
||||
right: 20px; /* sits just left of the native select chevron */
|
||||
top: 50%;
|
||||
transform: translateY(-50%);
|
||||
pointer-events: auto;
|
||||
margin: 0;
|
||||
}
|
||||
.hwfit-ctx-control {
|
||||
height: 28px;
|
||||
@@ -20539,21 +20661,27 @@ body.gallery-selecting .gallery-dl-btn,
|
||||
border-radius: 4px;
|
||||
color: var(--fg-muted);
|
||||
background: var(--bg);
|
||||
font-size: 10px;
|
||||
font-size: 12px; /* match .cookbook-field-input — was 10px and read smaller than siblings */
|
||||
box-sizing: border-box;
|
||||
}
|
||||
.hwfit-ctx-control span {
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.3px;
|
||||
opacity: 0.75;
|
||||
/* Match Quant/Engine select label style: no uppercase, no letter-spacing. */
|
||||
text-transform: none;
|
||||
letter-spacing: 0;
|
||||
opacity: 0.9;
|
||||
}
|
||||
.hwfit-ctx-control input[type="range"] {
|
||||
width: 54px;
|
||||
min-width: 54px;
|
||||
height: 16px;
|
||||
width: 64px;
|
||||
min-width: 64px;
|
||||
height: 4px;
|
||||
padding: 0;
|
||||
border: 0;
|
||||
background: transparent;
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
/* Hardcoded grey rail — was background:transparent here, which was the
|
||||
LATER-in-cascade override that kept making the rail invisible. */
|
||||
background: rgba(150, 150, 150, 0.65) !important;
|
||||
border-radius: 999px;
|
||||
accent-color: var(--accent, var(--red));
|
||||
}
|
||||
.hwfit-ctx-control output {
|
||||
|
||||
Reference in New Issue
Block a user