Cookbook polish: auto-reconnect, ctx slider fixes, scoring, lots of UI

Backend (services/hwfit + routes): - VRAM column sort now shows global highest first (was special-cased to ascending then truncated top-N, which made "highest VRAM" mathematically unreachable). Every column path uses reverse=True for the truncation. - Hardware probe cache TTL 30min -> 24h so changing filters doesn't keep re-probing the rig during a session; Rescan button still forces fresh. - Multi-GPU rigs filter GGUF Q*/IQ quants (vLLM/SGLang can't serve them); default non-prequantized to BF16 on 2+ GPUs. - AWQ / AWQ-8bit / GPTQ-8bit get a -1.0 quality penalty so FP8 wins ties. - Version-aware tiebreaker (parse Mn.n / Vn) — MiniMax-M2.7 ranks above M2.5. - hf_models.json: zai-org/GLM-5.1 added; zai-org/GLM-5 quantization flipped Q4_K_M -> BF16. DeepSeek-V4-Flash / -Pro + their -Base variants registered with new FP4-MoE-Mixed / FP8-Mixed quant keys (calibrated BPP from the actual 156 GB / 284 GB disk footprints). - New FP4-MoE-Mixed + FP8-Mixed entries in QUANT_BPP / QUANT_SPEED_MULT / QUANT_QUALITY_PENALTY / QUANT_BYTES_PER_PARAM / PREQUANTIZED_PREFIXES. Frontend — Scan/Download: - Engine + Quant swapped in the toolbar; Quant defaults to "All". - Ctx (range slider) ported from origin/main: 8k/16k/32k/50k/128k/Max. Drag re-sorts by vram ascending (smallest fitting first); back to Max → score. - Ctx slider rail now visible — was background:transparent in a duplicate later-cascade rule. Hardcoded grey + !important. - Search input moved to the far right of the toolbar. - Type/Standard default; "Context" not uppercased; Search placeholder dimmed. - Engine "?" + Quant "?" inline help chips inside their dropdown boxes. - Fit-column dot toggles fit-only filter; un-toggling re-sorts by VRAM desc. - Quant column truncates to 9 chars + ellipsis ("FP4-MoE-M..."), full in tooltip. Smart title-suffix strips the parts already in the repo name (QuantTrio/MiniMax-M2-AWQ + quant AWQ-4bit -> just "(4bit)"). - Conditional warning for safetensors models on non-GPU rigs only. - Dependency Install / Installed / Installed▾ / N/A all 75.85px wide. - Rebuild llama.cpp moved into the llama_cpp dep row, styled as a tag. - Foldable Download admin-card (h2 chevron); line under h2 only when folded. - HF token save gets a green ✓ + "Saved" flash. - Cached scan no longer counts stalled rows as downloaded. - Footer: "Request it →" link with GitHub mark to the public discussion (#1962) for model-add requests. Frontend — Running tab: - Strict download-finish check (DOWNLOAD_OK or /snapshots/, not bare "Download complete"). True overall % for multi-shard downloads: ((N-1)+frac)/total instead of hf_transfer's per-shard aggregate. - ETA in the uptime ticker: "downloading: 12m 34s · ETA 1h 23m". - Clear button kills the tmux session too; if the output still shows a live shard line, the pill is hidden + relabels as "reconnect" + revives on click. - Self-heal: on cookbook open AND every bg-monitor cycle (10s, throttled to 8s), scan persisted done/error/crashed downloads and probe their tmux session — if alive, flip status back to running and reattach. - Per-launch zombie probe: clicking Download on a model whose persisted state is done but tmux is still alive revives the existing task and refuses to start a duplicate. - Pre-launch GPU probe: vllm / sglang / diffusers serve check /api/cookbook/gpus first; warns + confirms if no GPU is visible. - Server-side state guard: rejects "done" POSTs for downloads lacking DOWNLOAD_OK / DOWNLOAD_FAILED / /snapshots/ when the last-mentioned shard is N<total — stale tabs can't poison persisted state any more. - Running count includes tasks whose output looks active even if persisted status got stuck. Dir text on the running row, font matched to uptime. Serve panel: - Ctx text input always resets to model max on open (default 20000 when metadata is missing). - Max Seqs default 8 -> 4. KV Cache dtype select 32px tall. - Lightning icon on Launch (same as Action toggle). - Diagnosis card simplified (no fold/copy/dismiss), suggestion font matches body; action buttons get icons on the left (Retry/Copy/Edit/ Install/Kill/Switch/etc.). - Incomplete-download serve warning when model status is downloading / stalled / has_incomplete. - MTP "?" tooltip ("supported on a few model families … up to ~3× faster").
2026-06-03 20:25:25 +09:00
parent 3706d756f3
commit 562bc4dedc
12 changed files with 669 additions and 115 deletions
--- a/services/hwfit/data/hf_models.json
+++ b/services/hwfit/data/hf_models.json
@@ -5110,6 +5110,100 @@
  "release_date": "2023-10-29",
  "_discovered": true
 },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Flash",
+  "provider": "deepseek-ai",
+  "parameter_count": "284B",
+  "parameters_raw": 284000000000,
+  "active_parameters": 13000000000,
+  "is_moe": true,
+  "min_ram_gb": 200.0,
+  "recommended_ram_gb": 320.0,
+  "min_vram_gb": 156.0,
+  "quantization": "FP4-MoE-Mixed",
+  "context_length": 1000000,
+  "use_case": "General-purpose reasoning, long-context",
+  "capabilities": [
+   "long_context",
+   "reasoning",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 3542202,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Flash-Base",
+  "provider": "deepseek-ai",
+  "parameter_count": "284B",
+  "parameters_raw": 284000000000,
+  "active_parameters": 13000000000,
+  "is_moe": true,
+  "min_ram_gb": 290.0,
+  "recommended_ram_gb": 460.0,
+  "min_vram_gb": 284.0,
+  "quantization": "FP8-Mixed",
+  "context_length": 1000000,
+  "use_case": "Base pretrained \u2014 fine-tuning starting point",
+  "capabilities": [
+   "long_context",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 0,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Pro",
+  "provider": "deepseek-ai",
+  "parameter_count": "1.6T",
+  "parameters_raw": 1600000000000,
+  "active_parameters": 49000000000,
+  "is_moe": true,
+  "min_ram_gb": 1100.0,
+  "recommended_ram_gb": 1800.0,
+  "min_vram_gb": 880.0,
+  "quantization": "FP4-MoE-Mixed",
+  "context_length": 1000000,
+  "use_case": "Flagship reasoning, long-context",
+  "capabilities": [
+   "long_context",
+   "reasoning",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 0,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
+ {
+  "name": "deepseek-ai/DeepSeek-V4-Pro-Base",
+  "provider": "deepseek-ai",
+  "parameter_count": "1.6T",
+  "parameters_raw": 1600000000000,
+  "active_parameters": 49000000000,
+  "is_moe": true,
+  "min_ram_gb": 1700.0,
+  "recommended_ram_gb": 2600.0,
+  "min_vram_gb": 1600.0,
+  "quantization": "FP8-Mixed",
+  "context_length": 1000000,
+  "use_case": "Base pretrained \u2014 fine-tuning starting point",
+  "capabilities": [
+   "long_context",
+   "moe"
+  ],
+  "pipeline_tag": "text-generation",
+  "architecture": "deepseek_v4_moe",
+  "hf_downloads": 0,
+  "hf_likes": 0,
+  "release_date": "2026-05-15"
+ },
 {
  "name": "deepseek-ai/deepseek-coder-6.7b-base",
  "provider": "DeepSeek",
@@ -13886,53 +13980,6 @@
  "gguf_sources": [],
  "capabilities": []
 },
- {
-  "name": "deepseek-ai/DeepSeek-V4-Flash",
-  "provider": "DeepSeek",
-  "parameter_count": "158B",
-  "parameters_raw": 158000000000,
-  "min_ram_gb": 165.0,
-  "recommended_ram_gb": 205.0,
-  "min_vram_gb": 165.0,
-  "quantization": "FP8",
-  "context_length": 1000000,
-  "use_case": "General purpose, reasoning (MoE)",
-  "is_moe": true,
-  "num_experts": null,
-  "active_experts": null,
-  "active_parameters": 13000000000,
-  "architecture": "deepseek_v4",
-  "pipeline_tag": "text-generation",
-  "release_date": "2026-04-22",
-  "gguf_sources": [
-   {
-    "repo": "unsloth/DeepSeek-V4-Flash",
-    "provider": "unsloth"
-   }
-  ],
-  "capabilities": []
- },
- {
-  "name": "deepseek-ai/DeepSeek-V4-Pro",
-  "provider": "DeepSeek",
-  "parameter_count": "1600B",
-  "parameters_raw": 1600000000000,
-  "min_ram_gb": 928.5,
-  "recommended_ram_gb": 1207.0,
-  "min_vram_gb": 928.5,
-  "quantization": "Q4_K_M",
-  "context_length": 1000000,
-  "use_case": "Frontier reasoning (MoE)",
-  "is_moe": true,
-  "num_experts": null,
-  "active_experts": null,
-  "active_parameters": 49000000000,
-  "architecture": "deepseek_v4",
-  "pipeline_tag": "text-generation",
-  "release_date": "2026-04-22",
-  "gguf_sources": [],
-  "capabilities": []
- },
 {
  "name": "google/gemma-4-E2B-it",
  "provider": "Google",
--- a/services/hwfit/fit.py
+++ b/services/hwfit/fit.py
@@ -564,7 +564,7 @@ def rank_models(system, use_case=None, limit=50, search=None, sort="score", quan
            })
        if use_case == "image_gen":
            sort_fn = SORT_KEYS.get(sort, SORT_KEYS["score"])
-            results.sort(key=sort_fn, reverse=(sort != "vram"))
+            results.sort(key=sort_fn, reverse=True)  # see main path below
            return results[:limit]

    # If user picked a native prequantized format, filter to only those models.
@@ -661,7 +661,10 @@ def rank_models(system, use_case=None, limit=50, search=None, sort="score", quan
        # explicitly asked for a Fit-only view.
        results = [r for r in results if r.get("fit_level") != "too_tight"]
    sort_fn = SORT_KEYS.get(sort, SORT_KEYS["score"])
-    # vram ascending (smallest first), everything else descending (biggest first)
-    results.sort(key=sort_fn, reverse=(sort != "vram"))
+    # Always sort descending then truncate top-N so each column shows the
+    # global highest by that metric. Before, vram was special-cased
+    # ascending → truncate kept the 50 SMALLEST models and "highest VRAM"
+    # could never appear, breaking the column-click toggle.
+    results.sort(key=sort_fn, reverse=True)
    results = results[:limit]
    return results
--- a/services/hwfit/hardware.py
+++ b/services/hwfit/hardware.py
@@ -5,7 +5,9 @@ import shutil
 import subprocess
 import time

-CACHE_TTL = 1800  # 30 min — hardware rarely changes; use the Rescan button to force a re-probe
+CACHE_TTL = 24 * 3600  # 24 h — hardware probes are user-initiated via the Rescan button; bumped
+                       # from 30 min so changing filters doesn't keep re-probing the rig every
+                       # half-hour during a long session.


 _remote_host = None  # set by detect_system(host=...)
--- a/services/hwfit/models.py
+++ b/services/hwfit/models.py
@@ -13,6 +13,13 @@ QUANT_BPP = {
    "AWQ-4bit": 0.50, "AWQ-8bit": 1.0,
    "GPTQ-Int4": 0.50, "GPTQ-Int8": 1.0,
    "mlx-4bit": 0.55, "mlx-8bit": 1.0, "mlx-6bit": 0.75,
+    # DeepSeek-V4-style mixed: MoE experts in FP4 (bulk), attention + non-
+    # expert dense in FP8, embeddings/LM head in BF16. By weight count the
+    # experts dominate so the effective BPP sits closer to FP4 than FP8.
+    # Empirical: DeepSeek-V4-Flash 284B / 156 GB ≈ 0.55 B/param.
+    "FP4-MoE-Mixed": 0.55,
+    # FP8-Mixed = the *-Base variants (MoE experts also FP8, not FP4).
+    "FP8-Mixed": 1.0,
 }

 QUANT_SPEED_MULT = {
@@ -24,6 +31,8 @@ QUANT_SPEED_MULT = {
    "AWQ-4bit": 1.2, "AWQ-8bit": 0.85,
    "GPTQ-Int4": 1.2, "GPTQ-Int8": 0.85,
    "mlx-4bit": 1.15, "mlx-8bit": 0.85, "mlx-6bit": 1.0,
+    "FP4-MoE-Mixed": 1.10,  # slightly slower than pure FP4 because of mixed-dtype dispatch
+    "FP8-Mixed": 0.85,
 }

 QUANT_QUALITY_PENALTY = {
@@ -39,6 +48,11 @@ QUANT_QUALITY_PENALTY = {
    "AWQ": -1.0, "AWQ-4bit": -4.0, "AWQ-8bit": -1.0,
    "GPTQ": -1.0, "GPTQ-Int4": -4.0, "GPTQ-Int8": -1.0,
    "mlx-4bit": -4.0, "mlx-8bit": -0.5, "mlx-6bit": -1.5,
+    # DeepSeek-V4 mixed: only MoE experts at FP4 (the rest is FP8/BF16),
+    # so the realized quality is much closer to FP8 than to pure FP4 —
+    # the activation-sensitive layers stay high-precision. ~0 penalty.
+    "FP4-MoE-Mixed": -0.5,
+    "FP8-Mixed": 0.0,
 }

 QUANT_BYTES_PER_PARAM = {
@@ -50,6 +64,8 @@ QUANT_BYTES_PER_PARAM = {
    "AWQ-4bit": 0.5, "AWQ-8bit": 1.0,
    "GPTQ-Int4": 0.5, "GPTQ-Int8": 1.0,
    "mlx-4bit": 0.5, "mlx-8bit": 1.0, "mlx-6bit": 0.75,
+    "FP4-MoE-Mixed": 0.55,
+    "FP8-Mixed": 1.0,
 }

 # Pre-quantized formats that should NOT go through the GGUF quant hierarchy.
@@ -57,6 +73,7 @@ QUANT_BYTES_PER_PARAM = {
 PREQUANTIZED_PREFIXES = (
    "AWQ-", "GPTQ-", "mlx-", "FP8", "FP4", "NVFP4", "MXFP4", "NF4",
    "INT4", "INT8", "W4A16", "W8A8", "W8A16",
+    "FP4-MoE-Mixed", "FP8-Mixed",
 )


--- a/static/index.html
+++ b/static/index.html
@@ -843,7 +843,7 @@
            <path d="M3 18a1 1 0 0 1-1-1V4a1 1 0 0 1 1-1h5a4 4 0 0 1 4 4 4 4 0 0 1 4-4h5a1 1 0 0 1 1 1v13a1 1 0 0 1-1 1h-6a3 3 0 0 0-3 3 3 3 0 0 0-3-3z"/>
          </svg>
          <span class="grow">Cookbook</span>
-          <span id="cookbook-bg-status" style="display:none;font-size:9px;opacity:0.5;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;margin-left:6px;flex-shrink:1;min-width:0;position:relative;top:-1px;"></span>
+          <span id="cookbook-bg-status" style="display:none;font-size:9px;opacity:0.5;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;margin-right:12px;flex-shrink:1;min-width:0;position:relative;top:-1px;"></span>
          <span class="cookbook-notif-dot" id="cookbook-notif-dot" style="display:none;margin-left:6px;margin-right:4px;position:relative;top:-1px;left:0px;"></span>
        </div>
        <div class="list-item" id="tool-research-btn">
--- a/static/js/cookbook-diagnosis.js
+++ b/static/js/cookbook-diagnosis.js
@@ -23,6 +23,44 @@ import {
  // browser loads it once. See cookbook-hwfit.js.
 } from './cookbook.js';
 import uiModule from './ui.js';
+
+// Tiny HTML-escape — keeps the file standalone instead of leaning on a
+// shared helper that may not be exported from this module's import surface.
+function _diagEsc(s) {
+  return String(s ?? '').replace(/[&<>"']/g, c => ({'&':'&amp;','<':'&lt;','>':'&gt;','"':'&quot;',"'":'&#39;'}[c]));
+}
+
+// Pick an icon for a diagnosis-action button based on the label. The icon
+// renders on the LEFT of the button text. Keeps the strokes consistent
+// across the set so they read as one family.
+function _diagFixIcon(label) {
+  const l = String(label || '').toLowerCase();
+  const _svg = (path) => `<svg width="11" height="11" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.2" stroke-linecap="round" stroke-linejoin="round" class="cookbook-diag-btn-ico" aria-hidden="true">${path}</svg>`;
+  if (l.startsWith('retry') || l.includes('relaunch') || l.includes('restart')) {
+    // Circular-arrow refresh
+    return _svg('<polyline points="23 4 23 10 17 10"/><polyline points="1 20 1 14 7 14"/><path d="M3.51 9a9 9 0 0 1 14.85-3.36L23 10M1 14l4.64 4.36A9 9 0 0 0 20.49 15"/>');
+  }
+  if (l.startsWith('copy')) {
+    return _svg('<rect x="9" y="9" width="13" height="13" rx="2"/><path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/>');
+  }
+  if (l.startsWith('edit')) {
+    return _svg('<path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4Z"/>');
+  }
+  if (l.startsWith('open') || l.includes('dependencies')) {
+    return _svg('<path d="M14 3h7v7"/><path d="M21 3l-9 9"/><path d="M21 14v5a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V5a2 2 0 0 1 2-2h5"/>');
+  }
+  if (l.startsWith('install') || l.includes('upgrade')) {
+    return _svg('<path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/><polyline points="7 10 12 15 17 10"/><line x1="12" y1="15" x2="12" y2="3"/>');
+  }
+  if (l.startsWith('kill') || l.startsWith('stop')) {
+    return _svg('<rect x="6" y="6" width="12" height="12" rx="1"/>');
+  }
+  if (l.startsWith('switch') || l.includes('use ')) {
+    return _svg('<polyline points="17 1 21 5 17 9"/><path d="M3 11V9a4 4 0 0 1 4-4h14"/><polyline points="7 23 3 19 7 15"/><path d="M21 13v2a4 4 0 0 1-4 4H3"/>');
+  }
+  // Default: lightbulb (generic "suggestion")
+  return _svg('<path d="M9 21h6"/><path d="M12 17v4"/><path d="M12 3a6 6 0 0 0-4 10.5c1 1 1.5 2 1.5 3.5h5c0-1.5.5-2.5 1.5-3.5A6 6 0 0 0 12 3Z"/>');
+}
 import spinnerModule from './spinner.js';

 // ── Error diagnosis ──
@@ -577,7 +615,7 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
        const btn = document.createElement('button');
        btn.className = 'cookbook-btn cookbook-diag-btn';
        btn.type = 'button';
-        btn.textContent = fix.label;
+        btn.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
        btn.addEventListener('click', (e) => {
          e.stopPropagation();
          runFix(fix, btn);
@@ -603,7 +641,7 @@ export function _showDiagnosis(panel, diagnosis, sourceText) {
    for (const fix of fixes) {
      const item = document.createElement('button');
      item.type = 'button';
-      item.textContent = fix.label;
+      item.innerHTML = _diagFixIcon(fix.label) + '<span class="cookbook-diag-btn-label">' + _diagEsc(fix.label) + '</span>';
      item.addEventListener('click', async (e) => {
        e.stopPropagation();
        if (item.dataset.busy || trigger.dataset.busy) return;
--- a/static/js/cookbook-hwfit.js
+++ b/static/js/cookbook-hwfit.js
@@ -527,6 +527,9 @@ export async function _hwfitFetch(fresh = false) {
      if (useCase) params.set('use_case', useCase);
      if (quantPref) params.set('quant', quantPref);
      if (targetCtx) params.set('ctx', String(targetCtx));
+      // Fit-only filter — set by the dot in the Fit column header.
+      const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
+      if (_fitOnly) params.set('fit_only', '1');
    }
    const endpoint = isImageMode ? `/api/hwfit/image-models?${params}` : `/api/hwfit/models?${params}`;
    const res = await fetch(endpoint);
@@ -888,9 +891,15 @@ export function _hwfitRenderList(el, models) {
      arrow = isReversed ? ' \u25B2' : ' \u25BC';
    }
    const dataAttr = col.key ? ` data-sort="${col.key}"` : '';
-    const label = (col.cls === 'hwfit-fit' && _budget)
-      ? `${col.label} <span style="font-size:0.75em;opacity:0.6;font-weight:normal;">(${_budget})</span>`
-      : col.label;
+    // Fit column gets a small dot to its left that toggles "show only models
+    // that fit" — replaces the old Fits On/Off button next to the toolbar.
+    let label = col.label;
+    if (col.cls === 'hwfit-fit') {
+      const _fitOnly = (() => { try { return localStorage.getItem('hwfit_fit_only_v1') === '1'; } catch { return false; } })();
+      label = `<span class="hwfit-fit-dot${_fitOnly ? ' active' : ''}" title="${_fitOnly ? 'Showing only models that fit. Click to also show too-tight rows.' : 'Click to show only models that fit your hardware.'}" data-fit-dot>●</span>${col.label}`;
+      // (Budget tag removed — the GPU/RAM/N-GPU suffix next to "Fit" was noise;
+      // the toggle row already shows which budget is active.)
+    }
    html += `<span class="hwfit-col ${col.cls}${sortable}${active}"${dataAttr}>${label}${arrow}</span>`;
  }
  html += '</div>';
@@ -910,9 +919,31 @@ export function _hwfitRenderList(el, models) {
    const dlDot = (_cachedModelIds && (_cachedModelIds.has(m.name) || [..._cachedModelIds].some(id => id === m.name?.split('/').pop()))) ? '<span class="hwfit-dl-dot" title="Downloaded">\u25CF</span>' : '';
    html += `<div class="hwfit-row" data-model="${esc(m.name)}">`;
    html += `<span class="hwfit-col hwfit-fit" style="color:${fitColor}">${esc(fitLabel)}</span>`;
-    html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(m.name?.split('/').pop() || m.name)}${moeBadge}${imgBadge}${dlDot}</span>`;
+    // Append quant to the title when it's not already in the repo name. The
+    // suffix strips quant-parts the name already contains — e.g. for
+    // QuantTrio/MiniMax-M2-AWQ + quant=AWQ-4bit we just show "(4bit)", not
+    // "(AWQ-4bit)". DeepSeek-V4-Flash + FP4-MoE-Mixed keeps the full tag
+    // (none of those parts are in the repo id).
+    const _short = m.name?.split('/').pop() || m.name || '';
+    const _quantTag = (m.quant || '').trim();
+    const _lowerShort = _short.toLowerCase();
+    let _quantSuffix = '';
+    if (_quantTag) {
+      const _parts = _quantTag.split(/[-_]/).filter(Boolean);
+      const _remaining = _parts.filter(p => !_lowerShort.includes(p.toLowerCase()));
+      if (_remaining.length && _remaining.length < _parts.length + 1) {  // at least one part is new
+        let _display = _remaining.join('-');
+        if (_display.length > 9) _display = _display.slice(0, 9) + '…';
+        _quantSuffix = ` <span class="hwfit-name-quant" title="${esc(_quantTag)} — full storage format">(${esc(_display)})</span>`;
+      }
+    }
+    html += `<span class="hwfit-col hwfit-name">${modelLogo(m.name)}${esc(_short)}${_quantSuffix}${moeBadge}${imgBadge}${dlDot}</span>`;
    html += `<span class="hwfit-col hwfit-c-params">${esc(pcount)}</span>`;
-    html += `<span class="hwfit-col hwfit-c-quant">${esc(m.quant || '?')}</span>`;
+    // Truncate the Quant cell to 9 chars + ellipsis so long tags like
+    // "FP4-MoE-Mixed" don't push neighboring columns. Full tag stays in title.
+    const _qRaw = m.quant || '?';
+    const _qShort = _qRaw.length > 9 ? _qRaw.slice(0, 9) + '…' : _qRaw;
+    html += `<span class="hwfit-col hwfit-c-quant" title="${esc(_qRaw)}">${esc(_qShort)}</span>`;
    html += `<span class="hwfit-col hwfit-c-vram">${vramLabel}</span>`;
    html += `<span class="hwfit-col hwfit-c-ctx">${m.is_image_gen ? '\u2014' : ctx}</span>`;
    html += `<span class="hwfit-col hwfit-c-speed">${m.is_image_gen ? '\u2014' : tps + ' t/s'}</span>`;
@@ -934,7 +965,26 @@ export function _hwfitRenderList(el, models) {
  });
  // Clickable header columns → sort (click again to toggle direction)
  el.querySelectorAll('.hwfit-header .hwfit-sortable').forEach(col => {
-    col.addEventListener('click', () => {
+    col.addEventListener('click', (e) => {
+      // The little dot inside the Fit header is its own toggle (fit-only
+      // filter), don't let it fall through to a sort click.
+      if (e.target.closest('[data-fit-dot]')) {
+        const on = !e.target.classList.contains('active');
+        try { localStorage.setItem('hwfit_fit_only_v1', on ? '1' : '0'); } catch {}
+        // Un-toggling the fit filter (off → showing too-tight rows again) is
+        // typically because the user wants to see the LARGE models they can't
+        // run yet — re-sort by VRAM descending so the biggest surface first.
+        if (!on) {
+          const sortSel = document.getElementById('hwfit-sort');
+          if (sortSel) {
+            sortSel.value = 'vram';
+            sortSel.dataset.reverse = '0';   // descending (biggest first)
+          }
+        }
+        _hwfitCache = null;
+        _hwfitFetch();
+        return;
+      }
      const sortKey = col.dataset.sort;
      if (!sortKey) return;
      const sel = document.getElementById('hwfit-sort');
@@ -1018,7 +1068,16 @@ export function _expandModelRow(row, modelData) {
  if (modelData.is_image_gen) {
    html += `<div style="font-size:10px;opacity:0.5;margin-top:4px;">${esc((modelData.capabilities || []).join(' \u00B7 ') || '')}${modelData.description ? ' \u2014 ' + esc(modelData.description) : ''}</div>`;
  } else if (_requiresAcceleratorBackend(modelData)) {
-    html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
+    // Only show the "needs CUDA/ROCm" note when the host doesn't already have
+    // one. With a visible CUDA/ROCm accelerator the note is noise — the user
+    // can already serve the model and reading the warning on every row makes
+    // the panel feel like everything's broken.
+    const _sys = _hwfitCache?.system || {};
+    const _backend = (_sys.backend || '').toLowerCase();
+    const _hasGpuAccel = !!_sys.has_gpu && (_backend === 'cuda' || _backend === 'rocm');
+    if (!_hasGpuAccel) {
+      html += `<div class="hwfit-panel-note">This is a safetensors GPU-serving format. Use vLLM/SGLang with a visible CUDA/ROCm accelerator, or pick a GGUF download for llama.cpp/Ollama.</div>`;
+    }
  }
  html += `</div>`;

@@ -1243,14 +1302,14 @@ export function _hwfitInit() {
      const targetCtx = _ctxValue();
      try { localStorage.setItem(_CTX_KEY, String(targetCtx)); } catch {}
      // Ctx drag affects sort mode: a specific ctx target (anything < Max)
-      // implies the user is hunting for "what fits at this context length",
-      // so re-rank by fit (lowest first). Dragging back to Max means no
-      // ctx constraint → go back to the default score-based ranking.
+      // implies "what runs at this context length" — sort by VRAM ascending
+      // so the cheapest-fitting models surface first. Dragging back to Max
+      // releases the constraint → go back to the default score ranking.
      const sortSel = document.getElementById('hwfit-sort');
      if (sortSel) {
        if (targetCtx) {
-          sortSel.value = 'fit';
-          sortSel.dataset.reverse = '1';
+          sortSel.value = 'vram';
+          sortSel.dataset.reverse = '1';   // ascending = smallest VRAM first
        } else {
          sortSel.value = 'score';
          sortSel.dataset.reverse = '';
--- a/static/js/cookbook.js
+++ b/static/js/cookbook.js
@@ -18,6 +18,7 @@ import {
  _launchServeTask, _serveAutoFix, _serveAutoRetry, _serveAutoRetryReplace, _serveAutoRetryRemove,
  _startBackgroundMonitor, _syncFromServer,
  _retryDownload, _nextAvailablePort, _processQueue,
+  _selfHealStaleTasks,
 } from './cookbookRunning.js';

 import {
@@ -641,6 +642,13 @@ async function _fetchDependencies() {
      const winBlocked = !isLocal && _isWindows() && _winUnsupported.has(pkg.name);
      const note = pkg.status_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.65;margin-top:3px;">${esc(pkg.status_note)}</div>` : '';
      const updateNote = pkg.installed && pkg.pip_update_available === false && pkg.update_note ? `<div class="memory-item-meta" style="font-size:10px;opacity:0.55;margin-top:3px;">${esc(pkg.update_note)}</div>` : '';
+      // Inline "Rebuild" tag for the llama_cpp row only. Styled as a
+      // .cookbook-dep-tag so it matches the LLM category tag's pill look,
+      // and lives to the LEFT of the category tag (clear affordance before
+      // the row "value").
+      const _rebuildBtn = (pkg.name === 'llama_cpp')
+        ? `<button type="button" class="cookbook-dep-tag cookbook-dep-rebuild" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build).">Rebuild</button>`
+        : '';
      return `<div class="cookbook-dep-row${winBlocked ? ' cookbook-dep-blocked' : ''}" data-pkg-name="${esc(pkg.name)}" data-dep-pip="${esc(pkg.pip || '')}" data-dep-target="${isLocal ? 'local' : 'remote'}" data-dep-kind="${esc(pkg.kind || 'python')}">`
        + `<div class="cookbook-dep-info">`
        + `<div class="memory-item-title">${esc(pkg.name)}</div>`
@@ -648,6 +656,7 @@ async function _fetchDependencies() {
        + note
        + updateNote
        + `</div>`
+        + _rebuildBtn
        + `<span class="cookbook-dep-tag cookbook-dep-cat">${esc(pkg.category)}</span>`
        + _statusTag(pkg, isLocal, isSystemDep, winBlocked)
        + `</div>`;
@@ -1237,6 +1246,10 @@ function _wireTabEvents(body) {
      const folded = dlFoldBody.style.display === 'none';
      dlFoldBody.style.display = folded ? '' : 'none';
      dlFoldChevron.textContent = folded ? '▾' : '▸';
+      // Toggle is-folded class on the h2 so the line under it only shows when
+      // the section is collapsed (the body's content normally provides
+      // separation; with no body visible, the line gives the h2 definition).
+      dlFold.classList.toggle('is-folded', !folded);
      try { localStorage.setItem('cookbook_dl_tab_folded_v1', folded ? '0' : '1'); } catch {}
    });
  }
@@ -1456,7 +1469,7 @@ export function _serverEntryHtml(s, i, defaultServer, forceRemote, isNew) {
  html += `<span class="cookbook-server-title" style="display:flex;align-items:center;gap:6px;width:100%;font-size:13px;font-weight:600;margin-bottom:4px;">`;
  html += `${esc(_srvTitle)}`;
  html += _pIco ? `<span class="cookbook-srv-platform" title="${esc(s.platform || '')}" style="display:inline-flex;align-items:center;opacity:0.55;">${_pIco}</span>` : '';
-  html += `<span class="cookbook-srv-test-msg" style="font-size:10px;font-weight:400;opacity:0.55;max-width:160px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;position:relative;top:2px;"></span>`;
+  html += `<span class="cookbook-srv-test-msg" style="font-size:10px;font-weight:400;opacity:0.55;max-width:160px;white-space:nowrap;overflow:hidden;text-overflow:ellipsis;position:relative;top:1px;"></span>`;
  if (isNew) {
    // New server: Cancel (discard) sits top-right; the default toggle only makes
    // sense once the server is saved.
@@ -1535,7 +1548,7 @@ function _renderRecipes() {
  // State persisted to localStorage so the fold survives reloads.
  const _dlTabFolded = (() => { try { return localStorage.getItem('cookbook_dl_tab_folded_v1') === '1'; } catch { return false; } })();
  html += '<div style="display:flex;align-items:center;gap:8px;margin-bottom:2px;">';
-  html += `<h2 id="cookbook-dl-tab-fold" style="margin:0;padding:0;line-height:1;cursor:pointer;display:flex;align-items:center;justify-content:space-between;user-select:none;flex:1;">Download<span id="cookbook-dl-tab-chevron" style="display:inline-block;transition:transform 0.15s;font-size:1.1em;margin-left:8px;opacity:0.85;">${_dlTabFolded ? '▸' : '▾'}</span></h2>`;
+  html += `<h2 id="cookbook-dl-tab-fold" class="${_dlTabFolded ? 'is-folded' : ''}" style="margin:0;padding:0;line-height:1;cursor:pointer;display:flex;align-items:center;justify-content:space-between;user-select:none;flex:1;">Download<span id="cookbook-dl-tab-chevron" style="display:inline-block;transition:transform 0.15s;font-size:1.1em;margin-left:8px;opacity:0.85;">${_dlTabFolded ? '▸' : '▾'}</span></h2>`;
  html += '</div>';
  html += `<div id="cookbook-dl-tab-fold-body" style="${_dlTabFolded ? 'display:none;' : ''}">`;
  html += '<p class="memory-desc doclib-desc" style="margin-top:6px;">Download from <a href="https://huggingface.co/models" target="_blank" rel="noopener" style="color:var(--accent,var(--red));text-decoration:none;"><svg width="10" height="10" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;margin-right:1px;"><path d="M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6"/><polyline points="15 3 21 3 21 9"/><line x1="10" y1="14" x2="21" y2="3"/></svg>HuggingFace</a> by pasting model link, or download directly in the Scan section below.</p>';
@@ -1605,36 +1618,43 @@ function _renderRecipes() {
  html += '<p class="memory-desc doclib-desc" style="margin-top:6px;">Scans your hardware for what models you can run. Hardware is cached; hit the scan button to re-probe after changing GPUs.</p>';
  html += '<div class="hwfit-toolbar" style="margin-top:9px;">';
  html += '<select class="cookbook-field-input hwfit-usecase" id="hwfit-usecase" style="height:28px;">';
-  html += '<option value="">Type</option><option value="general">General</option><option value="coding">Coding</option>';
+  html += '<option value="general" selected>Standard</option><option value="coding">Coding</option>';
  html += '<option value="reasoning">Reasoning</option><option value="chat">Chat</option>';
  // Image tab removed — text→image gen is gone from this build (only inpaint
   // remains, which uses its own settings panel). Vision (multimodal) stays.
  html += '<option value="multimodal">Vision</option></select>';
-  html += '<input type="text" class="cookbook-field-input hwfit-search" id="hwfit-search" placeholder="Search models..." style="flex:1;" />';
-  // Quant (Q4/Q8/…) lives next to the search now. Default is "All" so the
-  // list shows the best-scoring quant for every model instead of silently
-  // filtering to Q4 (which used to be the implicit default).
-  html += '<select class="cookbook-field-input hwfit-quant" id="hwfit-quant" style="height:28px;">';
-  html += '<option value="" selected>All</option>';
-  html += '<option value="Q4_K_M">Q4</option><option value="Q8_0">Q8</option>';
-  html += '<option value="Q6_K">Q6</option><option value="Q5_K_M">Q5</option>';
-  html += '<option value="Q3_K_M">Q3</option><option value="Q2_K">Q2</option>';
-  html += '<option value="AWQ-4bit">AWQ</option><option value="FP8">FP8</option><option value="FP4">FP4</option><option value="NVFP4">NVFP4</option></select>';
-  // Engine filter — show only models whose serve engine matches. Composes
-  // with quant / type / search filters.
+  // Engine sits next to the type filter so the "what category / which serving
+  // path" filters live together; Quant + Context are storage-format and budget
+  // levers, grouped to the right.
+  html += '<span class="hwfit-engine-wrap">';
  html += '<select class="cookbook-field-input hwfit-engine" id="hwfit-engine" style="height:28px;" title="Filter by serving engine">';
  html += '<option value="">Engine</option>';
  html += '<option value="llamacpp">llama.cpp</option>';
  html += '<option value="vllm">vLLM</option>';
  html += '<option value="sglang">SGLang</option>';
  html += '</select>';
-  html += '<span class="hwfit-help-chip" title="Higher numbers usually mean better quality, but they need more memory. Lower numbers fit on more hardware.">?</span>';
+  html += '<span class="hwfit-help-chip hwfit-help-chip-inline hwfit-engine-help" title="Rule of thumb: GGUF on single GPU / CPU+RAM → llama.cpp (or Ollama). Safetensors on multi-GPU NVIDIA → vLLM. SGLang is a vLLM-class alternative, sometimes faster on big-MoE / long-context.">?</span>';
+  html += '</span>';
+  // Quant (Q4/Q8/…). Default is "All" so the list shows the best-scoring
+  // quant for every model instead of silently filtering to Q4.
+  html += '<span class="hwfit-quant-wrap">';
+  html += '<select class="cookbook-field-input hwfit-quant" id="hwfit-quant" style="height:28px;">';
+  html += '<option value="" selected>Quant: All</option>';
+  html += '<option value="Q4_K_M">Q4</option><option value="Q8_0">Q8</option>';
+  html += '<option value="Q6_K">Q6</option><option value="Q5_K_M">Q5</option>';
+  html += '<option value="Q3_K_M">Q3</option><option value="Q2_K">Q2</option>';
+  html += '<option value="AWQ-4bit">AWQ</option><option value="FP8">FP8</option><option value="FP4">FP4</option><option value="NVFP4">NVFP4</option></select>';
+  html += '<span class="hwfit-help-chip hwfit-help-chip-inline hwfit-quant-help" title="Lower quant tiers (Q2/Q3/Q4 / AWQ-4bit) are smaller, faster, and cheaper to run, at some quality loss. Higher tiers (Q8 / FP8 / FP16 / BF16) preserve more quality but need more VRAM. “All” shows the best-scoring quant per model — pick a specific one to filter.">?</span>';
+  html += '</span>';
  // Ctx slider — lets you target a context length for fit estimates; the
  // hwfit ranking uses _ctxValue() to factor that into VRAM math, so
  // dragging this re-sorts the list toward models that fit your chosen ctx.
  html += '<label class="hwfit-ctx-control" title="Context length for fit estimates. Lower it to find more models that could fit your hardware.">';
-  html += '<span>Ctx</span><span class="hwfit-help-chip hwfit-help-chip-inline" title="Context length. Lower it to find more models that could fit your hardware; raise it when you need longer chats or documents.">?</span><input type="range" id="hwfit-context" min="0" max="5" step="1" value="3" />';
+  html += '<span>Context</span><span class="hwfit-help-chip hwfit-help-chip-inline" title="Context length. Lower it to find more models that could fit your hardware; raise it when you need longer chats or documents.">?</span><input type="range" id="hwfit-context" min="0" max="5" step="1" value="3" />';
  html += '<output id="hwfit-context-label">50k</output></label>';
+  // Search lives at the far right of the toolbar so the controls (Type/Quant/
+  // Engine/Context) read as a row of compact filters followed by free-text.
+  html += '<input type="text" class="cookbook-field-input hwfit-search" id="hwfit-search" placeholder="Search models..." style="flex:1;" />';
  html += '</div>';
  html += '<div class="hwfit-toolbar" style="margin-top:7px;">';
  html += '<select class="cookbook-field-input hwfit-server-select" id="hwfit-server-select" style="height:28px;min-width:88px;position:relative;top:0px;">';
@@ -1643,7 +1663,7 @@ function _renderRecipes() {
  html += '<div class="hwfit-gpu-toggles" id="hwfit-gpu-toggles"></div>';
  // Scan/refresh button (icon-only) where the quant dropdown used to sit.
  html += '<button type="button" class="hwfit-gpu-btn" id="hwfit-rescan" title="Re-scan hardware" style="flex-shrink:0;position:relative;top:-3px;left:-1px;">↻ RESCAN</button>';
-  html += '<button type="button" class="hwfit-gpu-btn hwfit-hw-manual-btn" id="hwfit-hw-manual-btn" title="Set hardware manually" style="flex-shrink:0;position:relative;top:-3px;left:-1px;">EDIT</button>';
+  html += '<button type="button" class="hwfit-gpu-btn hwfit-hw-manual-btn" id="hwfit-hw-manual-btn" title="Set hardware manually" style="flex-shrink:0;position:relative;top:-3px;left:-1px;display:inline-flex;align-items:center;gap:3px;"><svg width="10" height="10" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.2" stroke-linecap="round" stroke-linejoin="round" style="flex-shrink:0;"><path d="M12 20h9"/><path d="M16.5 3.5a2.121 2.121 0 0 1 3 3L7 19l-4 1 1-4Z"/></svg>EDIT</button>';
  // Sort state — the clickable column headers read/write this (pewds' original
  // sort paradigm). Newest is reachable by clicking the Model column header.
  html += '<select class="cookbook-field-input hwfit-sort" id="hwfit-sort" style="display:none">';
@@ -1663,6 +1683,16 @@ function _renderRecipes() {
  html += '</div>';
  html += '<div id="hwfit-hw-row" style="display:none;align-items:center;gap:4px;margin-top:3px;padding-top:2px;"><span style="font-size:10px;padding:2px 8px;border-radius:10px;background:color-mix(in srgb, var(--fg) 8%, transparent);color:var(--fg);opacity:0.7;white-space:nowrap;flex-shrink:0;position:relative;top:-1px;">Detected hardware</span><div class="hwfit-hw" id="hwfit-hw" style="flex:1;"></div></div>';
  html += '<div class="hwfit-list" id="hwfit-list"></div>';
+  // Footer: link to the public discussion where users can request additions
+  // to the curated model list. Sits below the list so it reads as a callout
+  // after browsing, not a header.
+  html += '<div class="hwfit-list-footer" style="margin-top:8px;padding-top:6px;border-top:1px solid color-mix(in srgb, var(--border) 50%, transparent);font-size:9.5px;opacity:0.65;text-align:right;">'
+       + 'Don\'t see a model? '
+       + '<a href="https://github.com/pewdiepie-archdaemon/odysseus/discussions/1962" target="_blank" rel="noopener" style="color:var(--accent,var(--red));text-decoration:none;display:inline-flex;align-items:center;gap:4px;vertical-align:middle;">'
+       + 'Request it →'
+       + '<svg width="11" height="11" viewBox="0 0 16 16" fill="currentColor" aria-hidden="true" style="flex-shrink:0;"><path d="M8 0C3.58 0 0 3.58 0 8a8 8 0 0 0 5.47 7.59c.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"/></svg>'
+       + '</a>'
+       + '</div>';

  html += '</div></div>';

@@ -1707,7 +1737,8 @@ function _renderRecipes() {
  html += '<div class="admin-card" style="flex:1;display:flex;flex-direction:column;overflow:hidden;">';
  html += '<div style="display:flex;align-items:center;gap:8px;margin-bottom:4px;">';
  html += '<h2 style="margin:0;padding:0;line-height:1;">Dependencies</h2>';
-  html += '<button class="cookbook-field-input" id="cookbook-rebuild-engine" title="Clear the cached llama.cpp build so the next serve recompiles from source (use after installing a CUDA/ROCm toolkit to turn a CPU-only build into a GPU build)." style="height:24px;font-size:10px;padding:0 8px;cursor:pointer;width:auto;">Rebuild llama.cpp</button>';
+  // Rebuild llama.cpp button moved into the llama_cpp dep row (see _depRow);
+  // having it in the title polluted the section header.
  html += '<span style="font-size:10px;opacity:0.5;margin-left:auto;">Server</span>';
  html += '<select class="cookbook-field-input" id="hwfit-deps-server" style="height:28px;min-width:70px;">';
  html += _buildServerOpts(false);
@@ -1746,7 +1777,7 @@ function _renderRecipes() {

  // ── Servers block ───────────────────────────────────────────────────
  html += '<div class="admin-card" style="flex:0 0 auto;display:flex;flex-direction:column;">';
-  html += '<div style="display:flex;align-items:baseline;gap:8px;margin-bottom:2px;margin-top:-8px;">';
+  html += '<div style="display:flex;align-items:baseline;gap:8px;margin-bottom:2px;margin-top:-4px;">';
  html += '<h2 style="margin:0;padding:0;line-height:1;">Servers</h2>';
  // Reuse the calendar +New pill: spinning plus, label fades in idea uses
   // the same `.cal-add-btn-text` rules, so styling stays consistent.
@@ -1893,6 +1924,11 @@ export async function open(opts) {
  _rendered = true;
  _clearCookbookNotif();
  _renderRunningTab();
+  // Self-heal: revive any download tasks whose tmux session is still alive
+  // but were persisted as done/error (covers the "restarted server while a
+  // big multi-shard download was in flight" case — the task survived in
+  // tmux, the cookbook just lost track of it).
+  try { _selfHealStaleTasks({ oneShot: true }); } catch {}
  if (_content) {
    // Put the panel in its entering state before it becomes visible. On
    // mobile, showing first and adding the class a frame later can paint the
--- a/static/js/cookbookDownload.js
+++ b/static/js/cookbookDownload.js
@@ -535,6 +535,42 @@ export async function _runModelDownload(panel, model, backend, hostOverride) {
    uiModule.showToast(`${shortName} is already ${duplicate.status === 'queued' ? 'queued' : 'downloading'}`);
    return;
  }
+  // Also catch zombie "done" tasks — the cookbook may have lost track of a
+  // download (server restart, stale state) while its tmux session is still
+  // alive on the host. Probe it; if alive, flip back to running + treat as
+  // duplicate so we don't kick off a second concurrent download writing to
+  // the same target dir.
+  const zombieCandidate = tasks.find(t => sameDownload(t)
+    && ['done', 'error', 'crashed', 'stopped'].includes(t.status)
+    && t.sessionId && !String(t.sessionId).startsWith('queue-'));
+  if (zombieCandidate) {
+    try {
+      const _zh = zombieCandidate.remoteHost || '';
+      const _zPort = (_envState.servers || []).find(s => s.host === _zh)?.port;
+      const _sshPf = _zh ? `ssh ${_zPort && _zPort !== '22' ? `-p ${_zPort} ` : ''}${_zh} '` : '';
+      const _sshSf = _zh ? `'` : '';
+      const _probeCmd = `${_sshPf}tmux has-session -t ${zombieCandidate.sessionId} 2>/dev/null${_sshSf}`;
+      const _r = await fetch('/api/shell/exec', {
+        method: 'POST', credentials: 'same-origin',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ command: _probeCmd, timeout: 5 }),
+      });
+      const _d = await _r.json();
+      if (_d.exit_code === 0) {
+        // tmux still alive → not actually done. Revive + tell the user.
+        const _fresh = _loadTasks();
+        const _ft = _fresh.find(t => t.sessionId === zombieCandidate.sessionId);
+        if (_ft) {
+          _ft.status = 'running';
+          _ft._selfHealed = true;
+          _saveTasks(_fresh);
+        }
+        _renderRunningTab();
+        uiModule.showToast(`${shortName} is still downloading (was marked finished after a restart — revived)`);
+        return;
+      }
+    } catch { /* probe failed — fall through and let the user launch */ }
+  }
  const activeOnHost = tasks.find(t => t.type === 'download' && (t.status === 'running' || t.status === 'queued') && (t.remoteHost || 'local') === targetHost);

  if (activeOnHost) {
--- a/static/js/cookbookRunning.js
+++ b/static/js/cookbookRunning.js
@@ -35,13 +35,34 @@ function _taskBadge(task) {
  return { text: _statusLabel(task.status, task.type), cls: 'cookbook-task-' + task.status };
 }

+// A download task whose tmux output still shows an active per-shard line
+// (e.g. "model-00012-of-00082.safetensors: 56%|") is NOT actually finished —
+// the cookbook just lost track. The clear pill becomes a "reconnect" affordance
+// in that case (click → revive the row + reattach the poll loop).
+function _downloadOutputLooksActive(task) {
+  if (!task || task.type !== 'download') return false;
+  const out = task.output || '';
+  if (!out) return false;
+  if (out.includes('DOWNLOAD_OK') || out.includes('DOWNLOAD_FAILED')) return false;
+  // An active shard line: filename + a colon + a percentage that isn't 100%.
+  // We catch any in-flight shard or "Downloading 'X' to ..." line (no %).
+  return /model-\d+-of-\d+\.[a-z]+:\s+(?!100%)\d+%/i.test(out)
+      || /Downloading\s+'[^']+'\s+to\s+'[^']*\.incomplete'/i.test(out);
+}
+
 function _canClearTask(task) {
  if (!task || task.status === 'running') return false;
  if (task.type === 'serve' && (task.status === 'ready' || task._serveReady)) return false;
+  // If the tmux output still shows an in-flight download, the task isn't
+  // actually finished — hide the clear/check pill so it doesn't show on a
+  // task that's still doing work. (The next render will reflect this and
+  // ideally the self-heal flips status back to running.)
+  if (_downloadOutputLooksActive(task)) return false;
  return ['done', 'stopped', 'error', 'crashed', 'failed'].includes(task.status);
 }

 function _clearPillLabel(task) {
+  if (_downloadOutputLooksActive(task)) return 'reconnect';
  return 'clear';
 }

@@ -1537,7 +1558,16 @@ export function _renderRunningTab() {

  const tasks = _loadTasks();
  const hasContent = tasks.length > 0;
-  const activeCount = tasks.filter(t => t.status === 'running' || t.status === 'queued').length;
+  // Count anything that's really active: explicit 'running'/'queued' status,
+  // OR a download whose tmux output is still showing live shard progress.
+  // Without the output check, a task whose status got stuck at 'done' /
+  // 'crashed' (before auto-reconnect catches it) would read as "Running 0"
+  // even when the model is actively downloading on the host.
+  const activeCount = tasks.filter(t =>
+    t.status === 'running'
+    || t.status === 'queued'
+    || _downloadOutputLooksActive(t)
+  ).length;
  const activeCountHtml = activeCount ? ` <span class="cookbook-tab-count">${activeCount}</span>` : '';

  let tabBar = body.querySelector('.cookbook-tabs');
@@ -1824,9 +1854,31 @@ export function _renderRunningTab() {
        const h = Math.floor(secs / 3600);
        const m = Math.floor((secs % 3600) / 60);
        const s = secs % 60;
-        _uptimeEl.textContent = h > 0
+        const _timer = h > 0
          ? `${_prefix}: ${h}h ${String(m).padStart(2,'0')}m`
          : `${_prefix}: ${m}m ${String(s).padStart(2,'0')}s`;
+        // ETA — only for downloads, only when we have a meaningful overall %.
+        // Reads the badge text (which already shows the true overall % we
+        // compute in the live-polling block) and back-derives a remaining-time
+        // estimate from elapsed/done. Hidden until pct >= 3% so the early-job
+        // wild estimates don't show.
+        let _eta = '';
+        if (task.type === 'download') {
+          const _badge = el.querySelector('.cookbook-task-status');
+          const _m = _badge && /^(\d+)%/.exec(_badge.textContent || '');
+          const _pct = _m ? parseInt(_m[1], 10) : 0;
+          if (_pct >= 3 && _pct < 100 && secs > 5) {
+            const _totalSec = Math.round(secs * (100 / _pct));
+            const _remain = Math.max(0, _totalSec - secs);
+            const _eh = Math.floor(_remain / 3600);
+            const _em = Math.floor((_remain % 3600) / 60);
+            const _es = _remain % 60;
+            _eta = _eh > 0
+              ? ` · ETA ${_eh}h ${String(_em).padStart(2,'0')}m`
+              : (_em > 0 ? ` · ETA ${_em}m ${String(_es).padStart(2,'0')}s` : ` · ETA ${_es}s`);
+          }
+        }
+        _uptimeEl.textContent = _timer + _eta;
      }, 1000);
    }

@@ -1874,11 +1926,32 @@ export function _renderRunningTab() {
    if (_clearChk) {
      _clearChk.addEventListener('click', (e) => {
        e.stopPropagation();
-        // Belt-and-suspenders: kill the tmux session too. For a real-finished
-        // task the session is already gone and kill-session errors silently,
-        // but for a task that was falsely flagged done (the strict-finish
-        // bug), this guarantees the still-running download actually stops
-        // rather than continuing to write to disk after the row is removed.
+        // If the output still shows an active shard line, the task isn't
+        // actually finished — clicking is "reconnect" (flip back to running
+        // + let _reconnectTask reattach to the live tmux session), not
+        // "clear". The pill label already reflects this via _clearPillLabel.
+        if (_downloadOutputLooksActive(task)) {
+          const _fresh = _loadTasks();
+          const _ft = _fresh.find(t => t.sessionId === task.sessionId);
+          if (_ft) {
+            _ft.status = 'running';
+            _ft._selfHealed = true;
+            _saveTasks(_fresh);
+          }
+          // Visually flip without waiting for a full re-render — same path the
+          // self-heal uses on cookbook open.
+          const _chk = el.querySelector('.cookbook-task-check');
+          if (_chk) _chk.style.display = 'none';
+          const _wave = el.querySelector('.cookbook-task-wave');
+          if (_wave) _wave.style.display = '';
+          const _up = el.querySelector('.cookbook-task-uptime');
+          if (_up) _up.style.display = '';
+          el.dataset.status = 'running';
+          _renderRunningTab();
+          return;
+        }
+        // Otherwise: real clear. Kill the tmux session as belt-and-suspenders,
+        // then animate out + remove the row.
        try {
          fetch('/api/shell/exec', {
            method: 'POST', credentials: 'same-origin',
@@ -2964,9 +3037,84 @@ function _refreshServerDots() {
  _syncSettingsServerDots(byKey);
 }

+// Self-heal: scan persisted download tasks marked done/error/crashed and
+// check whether their tmux session is still alive on the host. If yes —
+// the task isn't actually finished, the cookbook just lost the in-flight
+// status during restart — flip status back to 'running' so _reconnectTask
+// picks it up. The one-shot guard is enforced by callers (open path) or
+// time-throttled inside (background-monitor path).
+let _selfHealRan = false;
+let _selfHealLastTs = 0;
+export async function _selfHealStaleTasks(opts = {}) {
+  // Open-path call: one-shot per page load.
+  if (opts.oneShot) {
+    if (_selfHealRan) return;
+    _selfHealRan = true;
+  } else {
+    // Background-monitor call: throttle to once every 8s (the bg monitor
+    // itself fires every 10s, so this almost always fires too, but the
+    // guard keeps a fast manual call from doubling up).
+    const now = Date.now();
+    if (now - _selfHealLastTs < 8000) return;
+    _selfHealLastTs = now;
+  }
+  const tasks = _loadTasks();
+  const candidates = tasks.filter(t =>
+    t.type === 'download'
+    && ['done', 'error', 'crashed', 'stopped'].includes(t.status)
+    && t.sessionId
+    && !String(t.sessionId).startsWith('queue-')
+  );
+  if (!candidates.length) return;
+  let flipped = 0;
+  for (const t of candidates) {
+    try {
+      const res = await fetch('/api/shell/exec', {
+        method: 'POST', credentials: 'same-origin',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ command: _tmuxCmd(t, `has-session -t ${t.sessionId}`), timeout: 5 }),
+      });
+      const data = await res.json();
+      if (data.exit_code === 0) {
+        // Session still alive → the task is actually still running.
+        const fresh = _loadTasks();
+        const ft = fresh.find(x => x.sessionId === t.sessionId);
+        if (ft && ft.status !== 'running') {
+          ft.status = 'running';
+          ft._selfHealed = true;
+          _saveTasks(fresh);
+          flipped++;
+          const _el = document.querySelector(`.cookbook-task[data-task-id="${t.sessionId}"]`);
+          if (_el) {
+            const _chk = _el.querySelector('.cookbook-task-check');
+            if (_chk) _chk.style.display = 'none';
+            const _wave = _el.querySelector('.cookbook-task-wave');
+            if (_wave) _wave.style.display = '';
+            const _up = _el.querySelector('.cookbook-task-uptime');
+            if (_up) _up.style.display = '';
+            _el.dataset.status = 'running';
+          }
+        }
+      }
+    } catch { /* network blip — skip this one */ }
+  }
+  if (flipped) {
+    console.log(`[cookbook] auto-reconnect: revived ${flipped} task(s) whose tmux session was still alive`);
+    _renderRunningTab();
+  }
+}
+
 export function _startBackgroundMonitor() {
  if (_bgMonitorInterval) return;
-  _bgMonitorInterval = setInterval(() => { _pollBackgroundStatus(); _checkServeReachability(); }, BG_MONITOR_INTERVAL_MS);
+  _bgMonitorInterval = setInterval(() => {
+    _pollBackgroundStatus();
+    _checkServeReachability();
+    // Auto-reconnect: every cycle, look for download tasks marked finished/
+    // crashed/etc. whose tmux session is actually still running, and flip
+    // them back to running. Internally throttled to 8s so a manual call from
+    // the open path or a fast invocation doesn't double up.
+    _selfHealStaleTasks().catch(() => {});
+  }, BG_MONITOR_INTERVAL_MS);
  _pollBackgroundStatus();
  _checkServeReachability();
 }
--- a/static/js/cookbookServe.js
+++ b/static/js/cookbookServe.js
@@ -560,6 +560,15 @@ function _rerenderCachedModels() {
        + `</div>`;

      let panelHtml = `<div class="hwfit-serve-panel">${_slotsHtml}`;
+      // Warn when serving a model whose download hasn't fully completed —
+      // the user CAN still hit Launch (vLLM/llama-server will start, then
+      // crash trying to read missing shards), but they should know.
+      if (m && (m.status === 'downloading' || m.status === 'stalled' || m.has_incomplete)) {
+        const _warnText = m.status === 'stalled'
+          ? `This model looks like a stale download shell (${esc(m.size || '0 KB')}). The weights aren't on disk — the serve will fail to load. Re-download first, or pick another model.`
+          : `This model's download isn't complete yet (${esc(m.size || 'partial')}). The serve will start but is likely to crash on a missing shard. Wait for the download to finish, or relaunch after it's done.`;
+        panelHtml += `<div class="hwfit-serve-warn" style="margin:0 0 8px;padding:6px 10px;border-radius:5px;font-size:11px;background:color-mix(in srgb, var(--color-warning, #f0ad4e) 14%, transparent);border:1px solid color-mix(in srgb, var(--color-warning, #f0ad4e) 40%, transparent);color:var(--color-warning, #f0ad4e);display:flex;gap:6px;align-items:flex-start;line-height:1.4;"><span aria-hidden="true">⚠</span><span>${_warnText}</span></div>`;
+      }
      // Row 1: Backend + Server + Env
      panelHtml += `<div class="hwfit-serve-row">`;
      const _backendChoices = _isWindows()
@@ -597,13 +606,13 @@ function _rerenderCachedModels() {
      panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('TP','Tensor Parallelism — split model across N GPUs')}<select class="hwfit-sf" data-field="tp">${tpOpts}</select></label>`;
      // ctx resets to the model's max on every panel open (the real ctx slider
      // lives in the Scan/Download toolbar — see cookbook.js .hwfit-ctx-control).
-      panelHtml += `<label>${_l('Context','Max tokens per request — resets to the model max on every open. Lower = less VRAM')}<input type="text" class="hwfit-sf" data-field="ctx" value="${esc(m.context_length || m.context || '8192')}" /></label>`;
+      panelHtml += `<label>${_l('Context','Max tokens per request — resets to the model max on every open. Lower = less VRAM')}<input type="text" class="hwfit-sf" data-field="ctx" value="${esc(m.context_length || m.context || '20000')}" /></label>`;
      panelHtml += `<label>${_l('GPU','Which GPU to use. Leave empty for default')}<input type="text" class="hwfit-sf" data-field="gpu_id" value="${esc(sv('gpu_id', ''))}" placeholder="auto" style="width:50px;" /></label>`;
      panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('GPU Mem','Fraction of GPU memory (0.0–1.0). Lower if OOM')}<input type="text" class="hwfit-sf" data-field="gpu_mem" value="${esc(sv('gpu_mem', '0.90'))}" /></label>`;
      panelHtml += `<label class="hwfit-backend-vllm">${_l('Swap','CPU swap space in GB. Leave empty to omit (removed in newer vLLM)')}<input type="text" class="hwfit-sf" data-field="swap" value="${esc(sv('swap', ''))}" placeholder="off" /></label>`;
-      panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 8 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '8'))}" placeholder="8" /></label>`;
+      panelHtml += `<label class="hwfit-backend-vllm hwfit-backend-sglang">${_l('Max Seqs','Maximum concurrent requests. Lower = less memory. Default 4 — prosumer GPUs often OOM on vLLM default 256 during CUDA graph capture.')}<input type="text" class="hwfit-sf" data-field="max_seqs" value="${esc(sv('max_seqs', '4'))}" placeholder="4" /></label>`;
      panelHtml += `<label>${_l('Dtype','Data type for weights. auto picks best for GPU')}<select class="hwfit-sf" data-field="dtype">${dtypeOpts}</select></label>`;
-      panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype">${vllmKvCacheOpts}</select></label>`;
+      panelHtml += `<label class="hwfit-backend-vllm">${_l('KV Cache','vLLM --kv-cache-dtype. auto uses the model/runtime default; fp8 reduces KV memory for long context.')}<select class="hwfit-sf" data-field="vllm_kv_cache_dtype" style="height:32px;">${vllmKvCacheOpts}</select></label>`;
      panelHtml += `</div>`;
      // Row 2b: Diffusers settings
      const diffDtypeOpts = ['bfloat16','float16','float32'].map(d => `<option value="${d}"${sv('diff_dtype','bfloat16')===d?' selected':''}>${d}</option>`).join('');
@@ -696,7 +705,7 @@ function _rerenderCachedModels() {
        if (!_specMethods.includes(_specMethod)) _specMethods.unshift(_specMethod);
        const _specOpts = _specMethods.map(m =>
          `<option value="${m}"${m === _specMethod ? ' selected' : ''}>${m}</option>`).join('');
-        panelHtml += `<label class="hwfit-sf-cb hwfit-spec-group"><input type="checkbox" class="hwfit-sf" data-field="speculative" /> Speculative <select class="hwfit-sf hwfit-spec-method" data-field="spec_method" title="vLLM --speculative-config method">${_specOpts}</select><span class="hwfit-numstep"><button type="button" class="hwfit-numstep-btn" data-step="-1" tabindex="-1" aria-label="Decrease">‹</button><input type="number" class="hwfit-sf hwfit-spec-tokens" data-field="spec_tokens" value="${esc(_specTokens)}" min="1" max="10" title="num_speculative_tokens" /><button type="button" class="hwfit-numstep-btn" data-step="1" tabindex="-1" aria-label="Increase">›</button></span></label>`;
+        panelHtml += `<label class="hwfit-sf-cb hwfit-spec-group"><input type="checkbox" class="hwfit-sf" data-field="speculative" /> Speculative <select class="hwfit-sf hwfit-spec-method" data-field="spec_method" title="vLLM --speculative-config method">${_specOpts}</select><span class="hwfit-numstep"><button type="button" class="hwfit-numstep-btn" data-step="-1" tabindex="-1" aria-label="Decrease">‹</button><input type="number" class="hwfit-sf hwfit-spec-tokens" data-field="spec_tokens" value="${esc(_specTokens)}" min="1" max="10" title="num_speculative_tokens" /><button type="button" class="hwfit-numstep-btn" data-step="1" tabindex="-1" aria-label="Increase">›</button></span><span class="hwfit-help-chip hwfit-help-chip-inline" title="MTP / speculative decoding is supported on a few model families only — turn it on when the model card explicitly recommends it. On supported models it can boost inference throughput up to ~3×; on unsupported models it will either be ignored or fail to launch." style="margin-left:6px;">?</span></label>`;
      }
      if (_opts2.envVars.length) panelHtml += `<label class="hwfit-sf-cb"><input type="checkbox" class="hwfit-sf" data-field="moe_env" /> MoE Env Vars</label>`;
      panelHtml += `</div>`;
@@ -721,7 +730,7 @@ function _rerenderCachedModels() {
      // pushes Cancel + Launch to the right.
      panelHtml += `<span class="hwfit-serve-actions-spacer"></span>`;
      panelHtml += `<button class="cookbook-btn hwfit-serve-cancel" type="button" title="Close this configuration panel">Cancel</button>`;
-      panelHtml += `<button class="cookbook-btn hwfit-serve-launch">Launch</button>`;
+      panelHtml += `<button class="cookbook-btn hwfit-serve-launch"><svg width="11" height="11" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="vertical-align:-1px;margin-right:4px;flex-shrink:0;"><polygon points="13 2 3 14 12 14 11 22 21 10 12 10 13 2"/></svg>Launch</button>`;
      panelHtml += `</div>`;
      panelHtml += `</div>`;

@@ -1657,6 +1666,37 @@ function _rerenderCachedModels() {
          });
          return;
        }
+        // Pre-launch GPU probe — common failure pattern: vLLM/SGLang launched
+        // on a host where no GPU is visible (driver missing, $CUDA_VISIBLE_DEVICES
+        // unset, container without --gpus). Catch it BEFORE the user spends
+        // minutes watching the task fail.
+        const _needsGpu = ['vllm', 'sglang'].includes(serveState.backend)
+          || (serveState.backend === 'diffusers');
+        if (_needsGpu) {
+          try {
+            const _probeHost = (_envState.remoteHost || '').trim();
+            const _probeParams = new URLSearchParams();
+            if (_probeHost) {
+              _probeParams.set('host', _probeHost);
+              const _sp = (_envState.servers || []).find(s => s.host === _probeHost)?.port;
+              if (_sp) _probeParams.set('ssh_port', _sp);
+            }
+            const _probeRes = await fetch('/api/cookbook/gpus' + (_probeParams.toString() ? '?' + _probeParams : ''), { credentials: 'same-origin' });
+            const _probeData = await _probeRes.json();
+            const _probeGpus = Array.isArray(_probeData) ? _probeData : (_probeData.gpus || []);
+            if (!_probeGpus.length) {
+              const _proceed = await window.styledConfirm(
+                `No GPU detected on ${_probeHost ? _probeHost : 'this host'}. ${serveState.backend.toUpperCase()} needs a visible CUDA/ROCm accelerator to start — launching now will most likely crash early.\n\nLaunch anyway?`,
+                { title: 'No GPU detected', confirmText: 'Launch anyway', cancelText: 'Cancel', danger: true },
+              );
+              if (!_proceed) return;
+            }
+          } catch {
+            // Network / probe failure — don't block. Better to let the launch
+            // proceed than to silently refuse because the probe endpoint
+            // hiccuped (the user can read the real error in the task output).
+          }
+        }
        // Save in the { _byRepo, _lastUsed } schema — no legacy flat keys at
        // the root so per-model state doesn't leak between models.
        try {
--- a/static/style.css
+++ b/static/style.css
@@ -18628,16 +18628,41 @@ body.gallery-selecting .gallery-dl-btn,
  background: color-mix(in srgb, var(--fg) 10%, transparent);
  color: color-mix(in srgb, var(--fg) 60%, transparent);
 }
+/* Rebuild tag — same look as the LLM category tag, sits to its left. */
+.cookbook-dep-rebuild {
+  background: color-mix(in srgb, var(--fg) 10%, transparent);
+  color: color-mix(in srgb, var(--fg) 75%, transparent);
+  border: 1px solid color-mix(in srgb, var(--fg) 20%, transparent);
+  cursor: pointer;
+  font-family: inherit;
+  appearance: none;
+  -webkit-appearance: none;
+  -moz-appearance: none;
+}
+.cookbook-dep-rebuild:hover {
+  background: color-mix(in srgb, var(--accent, var(--red)) 18%, transparent);
+  color: var(--accent, var(--red));
+  border-color: color-mix(in srgb, var(--accent, var(--red)) 45%, transparent);
+}
 .cookbook-dep-installed {
  background: color-mix(in srgb, var(--green, #50fa7b) 18%, transparent);
  color: var(--green, #50fa7b);
  border: 1px solid color-mix(in srgb, var(--green, #50fa7b) 35%, transparent);
+  /* Match the Install button + Installed ▾ split width so all three variants
+     align in a mixed row. */
+  min-width: 75.85px;
+  padding: 0 10px;
+  box-sizing: border-box;
 }
 .cookbook-dep-na {
  background: color-mix(in srgb, var(--fg) 8%, transparent);
  color: color-mix(in srgb, var(--fg) 60%, transparent);
  border: 1px solid color-mix(in srgb, var(--fg) 16%, transparent);
  cursor: help;
+  /* Match other dep tag widths so N/A rows line up with Install / Installed. */
+  min-width: 75.85px;
+  padding: 0 10px;
+  box-sizing: border-box;
 }
 .cookbook-dep-install {
  background: var(--accent, var(--red));
@@ -18648,12 +18673,30 @@ body.gallery-selecting .gallery-dl-btn,
  font-weight: 500;
  position: relative;
  top: -3px;
+  /* Width matches the measured Installed ▾ split button (75.85px) so a row of
+     mixed Install / Installed deps lines up. */
+  min-width: 75.85px;
+  padding: 0 10px;
  /* Strip the native button box so it's the same height as the sibling tags
     (Firefox renders <button> taller otherwise); height comes from .cookbook-dep-tag. */
  appearance: none;
  -webkit-appearance: none;
  -moz-appearance: none;
 }
+/* Conditional line under the Download h2: only when the section is folded
+   (collapsed). When expanded, the body content provides separation; the
+   underline reads as clutter. */
+#cookbook-dl-tab-fold { border-bottom: none !important; padding-bottom: 0 !important; }
+#cookbook-dl-tab-fold.is-folded {
+  border-bottom: 1px solid color-mix(in srgb, var(--border) 40%, transparent) !important;
+  padding-bottom: 6px !important;
+}
+/* Center the "?" glyph inside the help chip. Without text-align it sits 0.5px
+   left of true center because of the character's natural baseline offset. */
+.hwfit-help-chip {
+  text-align: center;
+  padding-left: 0.5px;
+}
 .cookbook-dep-install:hover { opacity: 0.85; }
 /* Installed split button: "Installed" label + separator + ▾ caret; clicking it
   opens the actions menu (Update). Replaces the old ⋮ button. */
@@ -18709,12 +18752,13 @@ body.gallery-selecting .gallery-dl-btn,
  border: 1px solid var(--border);
  border-radius: 4px;
  background: var(--bg);
-  font-size: 11px;
+  font-size: 12px;  /* match .cookbook-field-input so Context reads same size as Engine/Quant */
 }
 .hwfit-ctx-control span {
-  text-transform: uppercase;
-  letter-spacing: 0.3px;
-  opacity: 0.75;
+  /* Match Quant/Engine select label style: no uppercase, no letter-spacing. */
+  text-transform: none;
+  letter-spacing: 0;
+  opacity: 0.9;
 }
 /* Editor-style slider (same look as the gallery editor sliders): thin pill
   rail that fattens on interaction, circular red thumb that grows on hover. */
@@ -18726,11 +18770,19 @@ body.gallery-selecting .gallery-dl-btn,
  border: 0;
  -webkit-appearance: none;
  appearance: none;
-  background: color-mix(in srgb, var(--fg) 25%, transparent);
+  /* Hard-coded grey so the rail is GUARANTEED visible regardless of theme —
+     every theme-derived color we tried (--fg-muted, --border, accent-bg mix)
+     kept blending into the panel background on at least one theme. */
+  background: rgba(150, 150, 150, 0.65);
  border-radius: 999px;
  accent-color: var(--red);
  cursor: pointer;
-  transition: height 0.15s ease;
+  transition: height 0.15s ease, background 0.15s ease;
+}
+.hwfit-ctx-control input[type="range"]:hover,
+.hwfit-ctx-control input[type="range"]:focus,
+.hwfit-ctx-control input[type="range"]:active {
+  background: var(--fg);
 }
 .hwfit-ctx-control input[type="range"]:hover,
 .hwfit-ctx-control input[type="range"]:focus,
@@ -19324,9 +19376,12 @@ body.gallery-selecting .gallery-dl-btn,
  position: relative;
  top: -4px;
  cursor: pointer;
-  padding: 1px 6px 1px 4px;
+  /* Tightened vertical padding so the hover-background isn't disproportionately
+     tall vs the icon+label. */
+  padding: 0 6px 0 4px;
+  height: 14px;
  border: 0;
-  border-radius: 9px;
+  border-radius: 7px;
  background: transparent;
  color: var(--fg);
  font-family: inherit;
@@ -20028,6 +20083,17 @@ body.gallery-selecting .gallery-dl-btn,
  border-color: var(--color-error);
  background: color-mix(in srgb, var(--color-error) 12%, transparent);
 }
+/* Icons on the left of diagnosis action buttons (Retry / Copy / Edit / etc.). */
+.cookbook-diag-btn,
+.cookbook-diag-menu button {
+  display: inline-flex;
+  align-items: center;
+  gap: 5px;
+}
+.cookbook-diag-btn-ico {
+  flex-shrink: 0;
+  opacity: 0.9;
+}

 /* ── What Fits? (hardware model fitting tab in cookbook) ── */
 .cookbook-group.hidden { display: none !important; }
@@ -20500,6 +20566,40 @@ body.gallery-selecting .gallery-dl-btn,
 .hwfit-toolbar .hwfit-usecase { min-width: 70px; flex-shrink: 0; }
 .hwfit-toolbar .hwfit-quant { min-width: 50px; flex-shrink: 0; }
 .hwfit-toolbar .hwfit-search { flex: 1; min-width: 80px; }
+/* Lower-opacity "Search models..." placeholder so it reads as a hint, not
+   a label — matches the muted form-field feel of the inline filters. */
+.hwfit-search::placeholder { opacity: 0.5; }
+.hwfit-search::-webkit-input-placeholder { opacity: 0.5; }
+.hwfit-search::-moz-placeholder { opacity: 0.5; }
+
+/* Dot inside the Fit column header — click to toggle the fit-only filter
+   (off = show too-tight rows; on = hide them). */
+.hwfit-fit-dot {
+  display: inline-block;
+  margin-right: 4px;
+  font-size: 8px;
+  line-height: 1;
+  color: color-mix(in srgb, var(--fg) 35%, transparent);
+  cursor: pointer;
+  vertical-align: middle;
+  position: relative;
+  top: -1px;  /* nudge 1px up so the small dot sits centered with the "Fit" caps */
+  transition: color 0.12s ease, text-shadow 0.12s ease;
+}
+/* Quant suffix appended to model names when the storage format isn't in the
+   repo id — e.g. "(FP4-MoE-Mixed)" after DeepSeek-V4-Flash. Muted to read as
+   metadata, not part of the name. */
+.hwfit-name-quant {
+  font-size: 0.78em;
+  opacity: 0.55;
+  font-weight: 400;
+  margin-left: 4px;
+}
+.hwfit-fit-dot:hover { color: var(--accent, var(--red)); }
+.hwfit-fit-dot.active {
+  color: var(--green, #50fa7b);
+  text-shadow: 0 0 4px color-mix(in srgb, var(--green, #50fa7b) 55%, transparent);
+}
 .hwfit-help-chip {
  width: 14px;
  height: 14px;
@@ -20526,6 +20626,28 @@ body.gallery-selecting .gallery-dl-btn,
 .hwfit-help-chip-inline {
  margin-left: -2px;
  margin-right: 0;
+  top: 0;  /* parent rule sets top:-1px; nudge inline variant 1px lower */
+}
+/* Quant select + inline ? wrapper — the ? sits inside the dropdown's bordered
+   box, anchored on the right just left of the chevron. */
+.hwfit-quant-wrap, .hwfit-engine-wrap {
+  position: relative;
+  display: inline-flex;
+  align-items: center;
+}
+.hwfit-quant-wrap .hwfit-quant,
+.hwfit-engine-wrap .hwfit-engine {
+  /* Make room for the ? on the right edge, in addition to the native chevron. */
+  padding-right: 32px;
+}
+.hwfit-quant-wrap .hwfit-quant-help,
+.hwfit-engine-wrap .hwfit-engine-help {
+  position: absolute;
+  right: 20px;   /* sits just left of the native select chevron */
+  top: 50%;
+  transform: translateY(-50%);
+  pointer-events: auto;
+  margin: 0;
 }
 .hwfit-ctx-control {
  height: 28px;
@@ -20539,21 +20661,27 @@ body.gallery-selecting .gallery-dl-btn,
  border-radius: 4px;
  color: var(--fg-muted);
  background: var(--bg);
-  font-size: 10px;
+  font-size: 12px;  /* match .cookbook-field-input — was 10px and read smaller than siblings */
  box-sizing: border-box;
 }
 .hwfit-ctx-control span {
-  text-transform: uppercase;
-  letter-spacing: 0.3px;
-  opacity: 0.75;
+  /* Match Quant/Engine select label style: no uppercase, no letter-spacing. */
+  text-transform: none;
+  letter-spacing: 0;
+  opacity: 0.9;
 }
 .hwfit-ctx-control input[type="range"] {
-  width: 54px;
-  min-width: 54px;
-  height: 16px;
+  width: 64px;
+  min-width: 64px;
+  height: 4px;
  padding: 0;
  border: 0;
-  background: transparent;
+  -webkit-appearance: none;
+  appearance: none;
+  /* Hardcoded grey rail — was background:transparent here, which was the
+     LATER-in-cascade override that kept making the rail invisible. */
+  background: rgba(150, 150, 150, 0.65) !important;
+  border-radius: 999px;
  accent-color: var(--accent, var(--red));
 }
 .hwfit-ctx-control output {