Commit Graph

286 Commits

Author SHA1 Message Date
ghreprimand
77611f0491 Scope memory consolidation by owner group
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 12:40:28 +09:00
Mihail Filippov
3d109cbaca Add explicit open-signup state endpoint
* Refactor open registration state switching

* Rename endpoint to open-signup
2026-06-02 12:35:54 +09:00
Leo
6fca7e86b7 Cookbook serve profiles and engine filter
* Cookbook: Engine filter + intelligent hardware-computed serve profiles

Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).

Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
  picker. Pure client-side view filter over the fetched list via the same
  _detectBackend() the serve commands use, so what you filter to is exactly what
  would launch. Re-renders from cache (no refetch). Empty-state message + the
  instant-cache-paint path account for it too.

Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
  model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
  context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
  instead of failing; a model that fits stays fully on GPU; quant tracks profile
  intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
  so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
  — partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
  context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
  quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
  --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
  previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
  Cache / Flash Attn fields. Context is clamped to the model's trained limit
  (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
  fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
  amdgpu ErrorDeviceLost.

Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook: make column-header sorting discoverable (incl. Newest)

Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.

- Style sortable headers as interactive: pointer cursor, hover underline, and
  the active sort column bolded/highlighted. There was no CSS for
  .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
  not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
  existing header-click sort wiring and the "newest" SORT_KEY.

No new sort control — uses the existing column-header paradigm.

Checks: node --check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)

In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.

- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
  mode), the quant is locked to the file's and profiles differ only in the real
  serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
  use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
  (no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
  the repo/file name) and passes them, so profiles match what's actually served.

Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.

Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor

Two serve-panel additions:

1. **Vision toggle.** A "Vision" checkbox that serves the model with its
   multimodal projector so it can read images. The mmproj path is resolved at
   runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
   the model folder makes the toggle just work; `--mmproj … --image-max-tokens
   1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.

2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
   while the panel is open and shows VRAM used/total/%, free, and — crucially on
   a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
   health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
   CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
   (previously read for total only and discarded for 'used').

Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.

Checks: node --check + py_compile pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:34:42 +09:00
spooky
8b3c0d8ad4 feat: select cached gguf artifacts for serve (#891) 2026-06-02 12:32:40 +09:00
Alexandre Teixeira
8455b88643 Improve Docker GPU setup diagnostics (#705)
* Improve Docker GPU setup diagnostics

Add a Docker GPU preflight script for NVIDIA users. The script is
read-only by default, checks host NVIDIA drivers, Docker availability,
and container GPU passthrough, and prints actionable next steps.

Add explicit opt-in modes to print install commands, install NVIDIA
Container Toolkit on Ubuntu/Debian, and enable the NVIDIA Compose overlay
in .env after passthrough is verified.

Document common NVIDIA Docker failure modes, ignore generated .env
backups, and clarify that Cookbook can only detect GPUs exposed to the
Odysseus container.

* Clarify Docker GPU diagnostic limits
2026-06-02 12:30:40 +09:00
Sirsyorrz
517aa593e0 Cookbook: clearer tooltips on saved-config badge and GPU chip (#850)
Two small polish items in the Cookbook Serve panel.

Saved-config badge
The little count badge next to the Save button ("3 ▾" etc.) had a
generic "Saved launch configs" tooltip, so the number reads like a
notification dot. Make it spell out what it is and what clicking does:
"3 saved launch configs for <model> — click ▾ to load or delete"
(and "No saved launch configs for <model> yet — click Save to add
one" when empty). Tooltip stays in sync via _updateSavedToggleLabel
so save/delete updates both the count and the hint.

GPU chip on mixed-GPU boxes (#711)
The chip label was `${gpuCount}x ${gpu_name}`, where gpu_name is
just gpus[0].name — so a 4090 + 3060 reads as "2x RTX 4090". The
backend already emits gpu_groups (identical cards grouped, used by
the serve flow to pin CUDA_VISIBLE_DEVICES) and a per-card gpus[]
array, so use them:

- Label renders each homogeneous pool: "1× RTX 4090 + 1× RTX 3060".
  Homogeneous setups keep the existing "2× RTX 4090" form.
- Tooltip lists each GPU with its index + VRAM, useful for picking
  the right device when launching.

Refs #711.
2026-06-02 12:30:24 +09:00
Dustin
bd3204fe96 Diagnose vLLM device detection failure with actionable suggestion (#778)
Adds a diagnosis pattern for the 'Failed to infer device type' error
vLLM raises when no CUDA or ROCm GPU is found (e.g. systems with only
integrated or Intel Xe graphics). The existing pattern only caught
'No CUDA GPUs are available' which fires later in startup; this new
entry catches the earlier device-probe failure and the NVML/amdsmi
library-not-found messages that precede it.

Surfaces in the Cookbook serve card as: "vLLM could not find a supported
GPU — switch to llama.cpp or Ollama" instead of a raw Python traceback.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-02 12:30:07 +09:00
IBR-41379
385c3c3cf3 fix: use sys.executable for Cookbook model cache scan on Windows (#627)
Windows has 'App Execution Aliases' that can make shutil.which('python3')
and shutil.which('python') resolve to a Microsoft Store stub instead of
real Python -- even when Python is properly installed. The stub outputs:

  'Python was not found; run without arguments to install from the
   Microsoft Store, or disable this shortcut from Settings > Apps >
   Advanced app settings > App execution aliases.'

and exits 9009, producing empty stdout. The JSON parse of the local
model cache scan then fails with 'Expecting value: line 1 column 1
(char 0)', and the Cookbook model list shows nothing.

Fix: prefer sys.executable as the interpreter for the local scan.
Odysseus already runs inside its own venv, so sys.executable always
points to the real venv Python and bypasses PATH / Store alias lookup
entirely. which_tool() is kept as a fallback.

Cross-platform: sys.executable works identically on Linux and macOS
(returns the real interpreter path), so this change is safe everywhere.
2026-06-02 12:29:40 +09:00
Ruben G.
25dcb1b10f fix(macos): make Homebrew dep install idempotent and non-fatal (#754)
start-macos.sh now skips Homebrew formulae that are already installed, so re-runs no longer re-hit Homebrew. tmux and llama.cpp are treated as optional: a failed install warns and continues instead of aborting the launch under set -e. Python stays required (it builds the venv).
2026-06-02 12:28:37 +09:00
Rolly Calma
32efeeb3a2 chore: use running event loop in async helpers (#821) 2026-06-02 12:28:05 +09:00
lolwuttav
c99193041a fix(cookbook): default Ollama serve to loopback (#872) 2026-06-02 12:27:04 +09:00
Tatlatat
ffb77d7ff2 fix(auth): honor AUTH_ENABLED=false on owner-scoped endpoints (no /login loop) (#880)
When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still
returned 401 (api/models, api/research/*, api/email/*), so the front-end
redirected the browser to /login and the app was unusable despite auth being
turned off. require_user() in src/auth_helpers.py already documents and honors
this contract (issue #622) via 'if _auth_disabled(): return ""', but these
endpoints did their own get_current_user/is_configured check without it.

Make _require_user (research), the /api/models anti-leak guard, and
email_helpers._require_auth consult _auth_disabled() and let anonymous through
(owner='') only when the operator explicitly disabled auth. The 401 protection
is fully intact when AUTH_ENABLED=true. Verified end-to-end: with
AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.
2026-06-02 12:26:26 +09:00
Mahdi Salmanzade
66cd44b66d fix(research): gate /api/research/spinoff on session ownership (#878)
The spinoff endpoint authenticated the caller (_require_user) but never
verified the research session belonged to them before reading the
persisted report and seeding it into a new chat session owned by the
caller. Any authenticated user who knew or guessed another user's
research session ID could exfiltrate that user's full report into their
own session — a cross-user data disclosure (IDOR).

Every other endpoint in this router gates on _owns_in_memory /
_assert_owns_research right after validating the session ID; spinoff was
the lone exception. Add the same _owns_in_memory check (covers both the
in-memory task and the on-disk JSON) so a non-owner gets a 404 before any
data is read or a session is created.

Add regression tests pinning the anonymous (401) and wrong-owner (404)
cases.
2026-06-02 12:26:12 +09:00
mist
fca8d68aba Match host, not substring, when resolving DuckDuckGo redirects (#886)
_resolve_ddg_redirect (the DuckDuckGo /l/?uddg= redirect resolver used on every
HTML-fallback result href) gated on `"duckduckgo.com" in parsed.hostname`. That
substring test also matches look-alike hosts like `duckduckgo.com.evil.com` and
`notduckduckgo.com`, so a result link on such a host would be silently rewritten
to its embedded `uddg` target. Same substring-vs-hostname pitfall fixed for
provider detection in 54ecfa3.

Match the host properly: exactly `duckduckgo.com` or a `.duckduckgo.com`
subdomain. Genuine redirects (`//duckduckgo.com/l/...`, and relative `/l/...`
hrefs resolved against `html.duckduckgo.com`) keep working.

The resolver was a closure inside duckduckgo_search; lifted it (plus the new
_is_duckduckgo_host helper) to module scope so it can be unit-tested directly.

Adds tests/test_ddg_redirect_resolution.py (red on the look-alike case before
this change, green after).
2026-06-02 12:25:56 +09:00
Mahdi Salmanzade
f691537472 fix(security): stop leaking the vault master password via process argv (#879)
The /api/vault/unlock handler ran `bw` as
`_run_bw(["unlock", req.master_password, "--raw"])`. _run_bw launches it with
`asyncio.create_subprocess_exec(bw_path, *args)`, so the master password became
a process argument — readable by any local user through `ps` and
`/proc/<pid>/cmdline` for the lifetime of the unlock subprocess. The Bitwarden
master password decrypts the entire vault, so this is a serious credential
exposure on any multi-user / shared host (CWE-214).

The sibling /login handler already avoids this by feeding the password on
stdin; unlock was the outlier. Hand the password to `bw` through the
environment instead (`--passwordenv BW_PASSWORD`), mirroring how BW_SESSION is
already passed — `/proc/<pid>/environ` is readable only by the process owner,
not other local users. Add regression tests pinning that the secret reaches
the subprocess env and never appears in argv.
2026-06-02 12:25:43 +09:00
Alexandre Teixeira
90878c380e Add resolve_endpoint fallback chain regressions (#890) 2026-06-02 12:24:50 +09:00
Alexandre Teixeira
d1d047dd11 Add Ollama port path detection regressions (#883) 2026-06-02 12:24:18 +09:00
Juan Pablo Jiménez
e58e4a185d Expose Cookbook user-install CLIs in Docker (#887)
Ensure pip --user console scripts like vLLM are visible to Docker
runtime and dependency probes by adding the user install bin directory
to PATH.
2026-06-02 12:23:29 +09:00
Tatlatat
9a1893760d fix(cookbook): skip pip --user fallback inside virtualenvs (#388) (#889)
The dependency-install fallback chain unconditionally ran
'pip install --user', which fails inside a virtualenv (and as root in
LXC/containers) with 'Can not perform a --user install. User site-packages
are not visible in this virtualenv.' — even though the function's docstring
already noted --user is invalid in venvs.

Guard the --user fallback with a venv check so it only runs outside a venv
(where --user is actually valid for PEP-668 system Pythons). Derive the venv
probe interpreter from the install command (python for 'pip', python3 for
'pip3'/'python3 -m pip') so the check runs in pip's own environment. System
PEP-668 installs keep the --user fallback; venv/LXC-root installs no longer
hit the --user error. Updated the unit test for the new chain.

Closes #388
2026-06-02 12:23:20 +09:00
pewdiepie-archdaemon
966b53df77 Improve Cookbook serve diagnostics and recommendations 2026-06-02 12:15:47 +09:00
Prakhya
bdc99d746a fix: add Browser MCP connection diagnostics (#662) 2026-06-02 11:50:17 +09:00
NovaUnboundAi
3319310942 Allow longer deep research extraction timeouts (#651)
Co-authored-by: NovaUnboundAi <NovaUnboundAi@users.noreply.github.com>
2026-06-02 11:50:03 +09:00
Achilleas90
247df16e82 Fix ordered list rendering in markdown preview (#645) 2026-06-02 11:49:44 +09:00
Rasmus
1882ad68ea fix: open #document deep-links on refresh and surface load errors (#631)
Add a hashchange handler for #document-<id> so refresh / URL-bar nav opens the document, and replace the silent console.error in loadDocument with a user-facing toast.

Closes #560
2026-06-02 11:48:54 +09:00
Christopher Milian
35ba56fa0c fix: remove ollama backend filter conflict (#613) 2026-06-02 11:48:35 +09:00
nsgds
5645cce6d0 Support vLLM 0.20.2 / NIM reasoning-parser output end-to-end (surface + agent context + render) (#602)
* fix(stream): read 'reasoning' SSE field for vLLM 0.20.2 / NIM

vLLM 0.20.2 / NVIDIA NIM emit reasoning-parser output in the `reasoning` delta field; older builds use `reasoning_content`. stream_llm() read only the latter, so reasoning from models like Nemotron-3-Nano (--reasoning-parser) was silently dropped and never rendered. Accept either field.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent): keep reasoning_content only on the latest assistant turn

The agent loop echoed each round's reasoning back as `reasoning_content` on every assistant turn, assuming vendors ignore it. Nemotron's chat template re-injects ALL prior reasoning_content as <think> blocks, and the loop is trimmed only once (before it starts) — so reasoning accumulated unbounded across rounds, bloating context and feeding the model its own prior reasoning, which reinforced repetition/looping. Strip reasoning_content from earlier assistant turns so only the most recent round carries it (still satisfies DeepSeek's thinking-mode follow-up requirement).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(agent-ui): wrap each round's reasoning in its own <think> block

The streamed think-tag wrapper gated on whole-message substring checks (accumulated.includes('<think>')), which only ever wrapped ONE reasoning block per message. A multi-round agent response has a reasoning phase per round, so once round 1 closed its <think>...</think>, rounds 2+ reasoning was emitted unwrapped and leaked into the visible answer. Replace the substring checks with a stateful open/close flag that toggles per think/answer cycle, so each round's reasoning gets its own collapsible block. Single-turn chat is unchanged (one open, one close).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(stream): reasoning/reasoning_content delta surfaces as thinking chunk

Covers @pewdiepie-archdaemon's requested regression: a streamed {reasoning: ...} delta emits a thinking chunk while {content: ...} streams as normal content; plus the older reasoning_content field for backward compat. Mirrors the #591 scenario.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:48:17 +09:00
nsgds
a857d2016d fix: don't bill self-hosted models reached by a container/service hostname (#596)
* fix(cost): treat dotless container hostnames as local (free)

getModelCost() substring-matches model names against a cloud price table, so a self-hosted 'nemotron'/'llama' model was billed at cloud rates. isLocalEndpoint() only recognized IPs / localhost / .local, not bare Docker service names (nim-nano, llamaswap), so the local-is-free guard missed them. A single-label hostname (no dot) can never be a public API -> treat as local.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(cost): isLocalEndpoint classifies service names local, cloud FQDNs billable

Covers @pewdiepie-archdaemon's requested cases: llamaswap/nim-nano + localhost/private-IPs/.local => local (free); api.openai.com/openrouter.ai/etc => not local. Drives the real function via node --input-type=module (same approach as test_reply_recipients_js.py), skips when node is absent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:47:58 +09:00
william-napitupulu
649cacfa05 Importing files bug (#582)
* Update Styles.css

Small update to the styles that bothered me, i noticed in the window/modal for calendar when editing a day the time icons had a mask that overlapped the icon.  I simply added 'background-image: none' prop to it/

* Importing files bug

I found a bug that wouldn't let me upload files in the library window during the documents tab, when a user selected a file, the code grabbed a reference to fileInput.files and immediately cleared the input value (fileInput.value = '') to allow for re-uploading the same file later. However, because fileInput.files is a live FileList tied directly to the DOM element, clearing the input inherently emptied our saved variable as well, resulting in lost file data.

Note this error might be browser specific as it worked fine on Zen/Firefox but failed on Edge and chrome

Fix use Array.From which copies the value into files instead of using refrences
2026-06-02 11:47:25 +09:00
Sirsyorrz
cb3d86608c Cookbook: pick the correct vLLM tool-call-parser for Qwen2.5 (#580)
The model-name detector treated every Qwen model as a Qwen3, falling
into the qwen3_xml parser:

    if (n.includes('qwen3') && n.includes('coder')) return 'qwen3_coder';
    if (n.includes('qwen')) return 'qwen3_xml';   // catches qwen2.5 too

qwen3_xml is the parser for Qwen3 reasoning/instruct models. Qwen2.5
(and Qwen2, Qwen1.5) ship with hermes-style tool calling, so the
qwen3_xml parser never recognises their tool calls — they leak through
as plain text in the assistant reply and the agent silently fails to
execute anything.

Reproduces with:
  vllm serve Qwen/Qwen2.5-Coder-14B-Instruct-AWQ ... \
    --enable-auto-tool-choice --tool-call-parser qwen3_xml
  → ask the agent to call any tool → JSON shows up in chat, no call runs.

Fix the ordering:
  qwen3 + coder → qwen3_coder
  qwen3         → qwen3_xml
  qwen          → hermes   (Qwen2.5 / Qwen2 / Qwen1.5)

Verified against the model matrix:

  Qwen2.5-Coder-14B-Instruct-AWQ → hermes
  Qwen2.5-7B-Instruct            → hermes
  Qwen3-8B                       → qwen3_xml
  Qwen3-32B                      → qwen3_xml
  Qwen3-Coder-30B-A3B            → qwen3_coder
  Qwen2-72B-Instruct             → hermes
  Qwen1.5-7B-Chat                → hermes
2026-06-02 11:47:15 +09:00
Rasmus
e73f3edc06 fix: scope chat active-document lookup to the session owner (#569) 2026-06-02 11:46:40 +09:00
mist
f13d897093 Fix AttributeError on bullet lines in extract_memory_from_chat (#873)
The fallback memory extractor (used by routes/memory_routes.py when the LLM
extractor fails) matched list items with `r'^[-*•]|\d+\.\s*(.*)'`. Operator
precedence makes that `(^[-*•]) | (\d+\.\s*(.*))`, so the capture group only
exists on the numbered-list branch.

A bullet line ("- foo") matches the first branch, so `group(1)` is None and
`text_match.group(1).strip()` raises AttributeError — crashing extraction for
any assistant message that contains a bullet list (i.e. most of them). Numbered
lists happened to work.

Group both markers — `r'^(?:[-*•]|\d+\.)\s*(.*)'` — so the capture applies to
bullets and numbers alike.

Adds tests/test_memory_bullet_extraction.py (red before, green after).
2026-06-02 11:46:06 +09:00
Kenny Van de Maele
2b39412355 Expand ~ in read_file and write_file paths (#781)
read_file/write_file passed the raw path to open(), so a tilde path like
~/notes.txt failed ("not found") — the shell's ~ expansion never happened
because there's no shell. Agents then fell back to bash to reach home-dir
files. Expand ~ (and ~user) with os.path.expanduser before opening.

Checks: python -m py_compile src/tool_execution.py.
2026-06-02 11:45:21 +09:00
Ernest Hysa
7669696bb0 fix(scheduler): push next_run forward on startup to stop restart double-fire (#708)
TaskScheduler.start() aborts stale TaskRun rows but never advanced
ScheduledTask.next_run. Across a restart the in-process _executing set
is empty, so the first post-restart _check_due_tasks() call dispatches
every task whose next_run is still in the past — and so does every
subsequent poll, until the task's regular _execute_task path finally
runs compute_next_run and pushes it forward.

start() now queries active tasks with next_run < now and pushes each
one to now + 60s. The first poll after restart sees them as not-yet-due,
the task runs once normally, and compute_next_run puts the schedule
back on its real cadence. Paused and not-yet-due tasks are left alone.

The validator test was rewritten as a regression test asserting the
opposite of the bug it originally demonstrated, plus two narrower cases
to lock down the filter (only active+overdue is touched).
2026-06-02 11:43:30 +09:00
ooovenenoso
15c7cb58e7 fix(cookbook): retry 0% HF download stalls sooner (#691)
Co-authored-by: Kevin <120500656+oooindefatigable@users.noreply.github.com>
2026-06-02 11:42:59 +09:00
Afonso Coutinho
634c16a019 fix: reply-all Cc's the user's own other addresses (multi-account) (#672)
* feat: publish all configured email addresses for reply-all exclusion

* fix: exclude all of the user's own addresses from reply-all, not just the active one

* test: reply-all excludes all of the user's configured addresses
2026-06-02 11:42:20 +09:00
Afonso Coutinho
48d3b7abab fix: topic analysis false-matches keywords as substrings (e.g. 'ai' in 'email') (#687)
* fix: match topic keywords on word boundaries, not substrings

* fix: apply word-boundary matching to topic example snippets too

* test: topic keywords match whole words, not substrings
2026-06-02 11:42:04 +09:00
Afonso Coutinho
9d8eebfa63 fix: source thumbnails dropped for http-only og:image URLs (#667)
* fix: accept http (not just https) og:image URLs for source thumbnails

* test: og:image extraction accepts http and skips relative/svg
2026-06-02 11:41:33 +09:00
elijaheck
c303a29670 Fix native macOS tailnet launch and Metal GPU probe (#756)
* macOS/Apple Silicon: detect Metal backend, surface MLX models, brew tmux hint

- hardware.py: add _detect_macos() via sysctl/system_profiler; report
  backend=metal + unified_memory on Apple Silicon instead of cpu_arm
- fit.py: add Apple Silicon (M1-M5) unified-memory bandwidths + metal
  FALLBACK_K so throughput estimates use the real bandwidth formula
- setup.py: Mac-specific 'brew install tmux' hint

Verified on M5 Pro 48GB: backend=metal, 273GB/s matched, 6 MLX models now
visible (were hidden), cuda still hides MLX, no new test failures.

* Fix native macOS tailnet launch and Metal GPU probe

---------

Co-authored-by: Elijah (Hermes) <hermes@local>
2026-06-02 11:41:04 +09:00
James Arslan
a327df6936 Fix native tool-calling follow-up round on Gemini and Ollama (#867)
The agent's multi-round (tool-result) follow-up request was rejected with
HTTP 400 on two providers, so tools ran but the agent never produced an answer:

- OpenAI-compatible streaming (Gemini 3) dropped the per-call thought_signature
  and collided parallel tool calls, which arrive with index=None: they all
  landed in slot 0, overwriting the first call's name and corrupting its
  arguments by concatenation, so the follow-up request 400'd. Capture and replay
  each call's extra_content (thought_signature), and give every parallel call
  its own accumulator slot (allocated above the max key, so sparse or mixed
  indices can't collide).
- Native Ollama /api/chat expects object tool-call arguments, but Odysseus
  carries them as a JSON string, which Ollama rejected ("Value looks like
  object, but can't find closing '}' symbol"). Convert them to objects in the
  Ollama payload builder.

Both compose with the no-prose null-content sanitize fix from #862.

Tested: python -m pytest tests/test_llm_core_streaming.py
tests/test_llm_core_ollama.py tests/test_agent_loop.py (53 pass), and
python -m py_compile src/llm_core.py src/agent_loop.py.
2026-06-02 11:39:40 +09:00
Mahdi Salmanzade
54ac4a74fb Attribute API-token sessions to the token owner (effective_user) (#871)
Split 2/4 of the companion bridge (#863 was 1/4). A paired bearer-token caller
runs as the sandboxed 'api' pseudo-user, so its sessions were stranded in a
separate 'api'-owned silo, invisible to the owner's desktop UI.

Add effective_user(): for a bearer token it resolves to the token's real owner
(request.state.api_token_owner); for cookie sessions it is identical to
get_current_user, so the swap is a no-op for browser users. Route session
ownership/attribution in routes/session_routes.py through it.

Tests (tests/test_session_owner_attribution.py):
- cookie/browser users are unchanged
- a bearer token attributes to its owner; with no owner it does NOT escalate
- _verify_session_owner: a bearer token for owner A cannot verify owner B's
  session (404); owner verifies their own; missing -> 404; unauth -> 403
2026-06-02 11:39:01 +09:00
Mahdi Salmanzade
bc00a9fc7f fix(security): fail closed on null-owner session in sync-chat endpoint (#870)
POST /api/v1/chat (the n8n/Make/Activepieces sync-chat endpoint) verified
session ownership with `_tok_user and _sess_owner and _sess_owner != _tok_user`.
The `_sess_owner and` clause skipped the check entirely whenever the session's
owner was null — so any chat-scoped API token (e.g. a token minted for a paired
mobile device) could pass a legacy/migrated null-owner session id, inject a
message into that session, and read back its conversation history plus reuse
the owner's endpoint credentials.

This is the same `if owner and owner != user` null-owner-bypass pattern that
was already hardened in the gallery, calendar, and notes routes (see
test_null_owner_gates.py) and in session_routes._verify_session_owner. Make
this gate strict and fail closed too: require a resolvable caller and an exact
owner match, mirroring _verify_session_owner. Extract the decision into
_caller_owns_session() and pin it with regression tests.
2026-06-02 11:38:05 +09:00
James Arslan
6776c7d691 Surface silent model fallback instead of masking it (#868)
When the selected model fails before producing output, stream_llm_with_fallback
quietly switches to the next candidate and the reply is shown under the
originally selected model's name, so a misconfigured provider looks like it
works. (Concretely: a Bedrock gateway that 400s every Anthropic/Claude request
appears fine because another model silently answers under the Claude label.)

Emit a `fallback` SSE event ({selected_model, answered_by, reason}) the first
time a non-primary candidate produces output, forward it through the agent loop
and both chat-route paths, stamp the response metrics with the model that
actually answered, and show a notice + relabel the reply in the UI.

Tested: python -m pytest tests/test_llm_core_fallback.py (3 pass);
python -m py_compile src/llm_core.py src/agent_loop.py routes/chat_routes.py;
node --check static/js/chat.js.
2026-06-02 11:37:25 +09:00
Tatlatat
2d6b777799 fix(cookbook): diagnose 'no GGUF file' serve failures clearly (#811) (#866)
When serving with the llama.cpp backend and no .gguf file exists on the host,
the GGUF launcher prelude exits with 'ERROR: No GGUF found on this host', but
_diagnose_serve_output had no matching pattern, so the UI showed a generic
crash instead of explaining the cause. Add a diagnosis pattern for the
no-GGUF case so users are told a .gguf is required and pointed at downloading
a GGUF build, instead of an opaque crash.

Closes #811
2026-06-02 11:36:53 +09:00
Ernest Hysa
360bc83a66 fix(history): scope topic analysis to authenticated owner only (#744)
Two changes close the cross-tenant topic leak in /api/conversations/topics.

The route at routes/history_routes.py:478 used get_current_user, which
returns None when no auth middleware has set request.state.current_user
(loopback-bypass, AUTH_ENABLED=false, or any path that short-circuits the
middleware). It then forwarded owner=None to analyze_topics.

The helper at src/topic_analyzer.py:21 used an 'if owner:' short-circuit
in its owner filter, so the None owner took the no-filter path and the
helper silently aggregated topic frequencies and per-snippet session_id,
session_name, role, and snippet text across every user's sessions.

analyze_topics now returns an empty result when owner is falsy. The
inner short-circuit is removed because the filter is now strict by
construction. The route is switched to require_user, which raises 401
when auth_manager.is_configured is True and the caller is anonymous,
matching the pattern used by calendar_routes, skills_routes, and other
authenticated routes.

The test test_history_topics_owner_scope.py was rewritten to drive the
real route through FastAPI's TestClient with a stub AuthMiddleware that
mirrors the loopback-bypass branch, and now asserts a strict 401 from
the route and an empty result from the helper. The previous version of
the test accepted either a 200-with-empty-topics or a 401; the strict
assertion means a future regression that drops the require_user wrapper
or re-adds the inner short-circuit is caught immediately.
2026-06-02 11:36:01 +09:00
tanmayraut45
1cc2e90ac0 Apply SafeSearch by default across search providers (#763)
#718 reported Deep Research drifting into adult / spam URLs several
rounds into a benign session ("research about https://bhagathgoud.com/
and what he doing currently"). The reporter's log showed Japanese
adult sites being crawled even though the model was emitting normal
queries like "Bhagath Goud LinkedIn" and "site:bhagathgoud.com".

The model wasn't generating those URLs. Every provider call site
constructed its params dict without a SafeSearch parameter, so the
underlying HTTP backend (the duckduckgo-search library / DDG's HTML
endpoint in this case) was free to surface "related search" /
trending / spam recommendations that have nothing to do with the
user's query. Per provider:

- SearXNG: instance-dependent; many self-hosted instances default
  to safesearch=0.
- Brave API: defaults to "off" for new API keys.
- duckduckgo-search lib: defaults to "moderate", which still lets
  related-search recommendations and HTTP-backend fallback URLs
  surface trending non-English spam topics.
- DDG HTML fallback (html.duckduckgo.com): no `kp` param, treated
  as off.
- Google PSE: omitted `safe` is equivalent to off.
- Serper: omitted `safe` proxies to Google with safe off.

Since the bad URLs entered through the provider layer, not the
model, the provider params are the right place to gate this.

Changes:

- src/settings.py: new `search_safesearch` setting with default
  "strict". Documented values ("strict" | "moderate" | "off") plus
  a few aliases ("on", "high", "0/1/2", "disabled", ...) so a
  hand-edited config doesn't silently fall through to off.
- src/search/providers.py:
  - Add `_get_safesearch_level()` (canonical, normalizing) and
    `_safesearch_for(provider)` (per-provider param translation).
  - Thread the per-provider value into every params dict:
    SearXNG JSON, SearXNG language/engines fallbacks, SearXNG HTML,
    Brave, DDG library, DDG HTML fallback, Google PSE, Serper.
  - Tavily is left untouched — its API has no SafeSearch knob and
    its index already filters explicit content at ingest time.

Behavior change for existing installs: default is now "strict", so
explicit results get filtered across every supported provider
without any user action. Users who deliberately want unfiltered
results can set `search_safesearch` to "off" in Settings. No new
dependencies, no schema migrations.

Closes #718.
2026-06-02 11:34:32 +09:00
tanmayraut45
eff762cdd9 Expose manage_notes via native function calling (#759)
The agent's RAG tool selector retrieves manage_notes as relevant for
note / todo / reminder requests, but two gaps stopped it from actually
firing on local llama.cpp / vLLM endpoints:

1. FUNCTION_TOOL_SCHEMAS had no entry for manage_notes. Even when the
   tool was marked relevant, no JSON schema was sent on the function
   tools list, so native-function-calling models had nothing to call.
   In practice the model would describe creating the note in prose
   while the actual note stayed blank — the symptom reported in #713
   ("checklist hallucinated as blank").

2. _API_HOSTS only listed hosted providers (OpenAI, Anthropic, etc.).
   For local endpoints like http://localhost:8080 or
   http://host.docker.internal:8000, _is_api_model fell back to
   keyword-sniffing the model name, so any model whose slug didn't
   happen to match the keyword list silently lost native tool
   schemas entirely.

Fixes:

- src/tool_schemas.py: add a manage_notes function schema covering
  list/add/update/delete/toggle_item with the full Keep-style field
  set. note_type is exposed as an enum ("note" | "checklist") so the
  model picks the mode explicitly instead of inferring it from
  content shape. Items are named checklist_items in the schema —
  consistent with the description's wording and avoiding the
  Python-built-in name clash that #713 calls out.

- src/tool_implementations.py: do_manage_notes accepts both
  checklist_items (new, schema-exposed) and items (legacy /
  internal). Direct API callers and existing code paths keep
  working unchanged; native function calls following the new
  schema route through the same path.

- src/agent_loop.py: add localhost, 127.0.0.1, and
  host.docker.internal to _API_HOSTS so the function-tool path is
  not gated behind model-name guessing for local servers.

Closes #174.
Closes #713.
2026-06-02 11:33:32 +09:00
hawktuahs
a2f6183c4a Fix cookbook pip installs in venvs (#723) 2026-06-02 11:31:59 +09:00
Mahdi Salmanzade
e152a339d1 Deep research: don't treat a bare 'yes' as the research topic (#858)
Deep research asks 2-3 clarifying questions first. When the user answers
with a bare affirmation ('yes', 'ok', 'go ahead'), that short message
becomes latest_message and the query-synthesis fallback returned it
verbatim, so research ran on the literal word 'yes'.

In ResearchHandler.synthesize_query, when synthesis can't run (history
too short) or fails, fall back to the earliest substantive user message
(the original ask) only when the latest message is an explicit
affirmation/continuation phrase or is empty/punctuation-only. There is
deliberately no length heuristic: a short answer like 'UK', 'C++', or
'Rust' in a clarification flow is a real topic and is left untouched.

Tests cover query/topic selection: bare 'yes' -> original ask, short
answers (UK, C++) kept, short-only-substantive message kept, and a
multi-word follow-up still flows through synthesis.
2026-06-02 11:30:53 +09:00
BarsatZulkarnine
00f16d66a3 Fix test suite: ESM module loading and stub isolation (#844)
* Fix test suite: ESM loading and stub isolation (refs #605)

Three targeted fixes to reduce suite failures from 9 → 1:

1. package.json: add "type": "module" so Node loads static/js/**
   as ES modules. Fixes 7 tests in test_compare_js.py and
   test_reply_recipients_js.py that fail with
   "SyntaxError: Unexpected token 'export'".

2. test_null_owner_gates.py: add Base and ChatMessage to the
   core.database stub. Without Base the scheduler test cannot
   import at collection time; without ChatMessage core/__init__.py
   fails mid-load when session_manager.py tries to import it,
   leaving core partially initialised in sys.modules and poisoning
   the auth manager migration test that runs later in the same file.

3. test_task_scheduler_session_delivery.py: skip gracefully when
   core.database is stubbed (Base is a MagicMock) rather than
   crashing. The test passes correctly when run in isolation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Scope ESM declaration to static/js/ and document isolation workaround

Per review feedback on #844:

1. Move "type": "module" from root package.json to static/js/package.json.
   The root package.json had no type field (defaulted to CJS) and should
   stay that way — vendored UMD bundles in static/lib/ use require() internally
   and would break if Node ever tried to load them as ES modules. Node resolves
   the nearest package.json, so adding it in static/js/ scopes the ESM
   declaration to just the files the JS unit tests actually load
   (compare/state.js, emailLibrary/replyRecipients.js).

2. Expand the module-level skip comment in test_task_scheduler_session_delivery
   to document that it is a temporary isolation workaround, explain root cause
   (test_null_owner_gates installs a module-level sys.modules stub with no
   cleanup), record before/after suite numbers, and note the clean path
   (refactor to fixture-scoped stub).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 11:29:29 +09:00
Marius Oppedal Ringsby
f58fbc8b85 Add optional markitdown extraction for Office/EPUB documents (#766)
Office documents were dropped server-side: .docx fell through to
"[Attached document file]", .xlsx/.pptx weren't recognized at all, and
the personal-docs RAG index only covered txt/md/json/pdf.

Wire the optional markitdown dependency (MIT, Microsoft) into both the
chat-attachment path (build_user_content) and the RAG indexer
(personal_docs), converting .docx/.xlsx/.pptx/.xls/.epub to Markdown.
It is lazy-imported with graceful fallback (mirrors src/pdf_runtime.py):
without it those formats show an "install to extract" banner and the
MIT core is unaffected. pypdf stays the default PDF path.

- src/markitdown_runtime.py: optional-dep loader + convert_to_markdown
- upload_handler: recognize Office/EPUB extensions + MIME types
- document_processor: extract Office docs in the chat else-branch
- personal_docs: index Office docs (DEFAULT_EXTENSIONS + dispatch)
- requirements-optional.txt + ACKNOWLEDGMENTS.md: pinned markitdown 0.1.5
- tests: markitdown_runtime + office index coverage

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 11:28:52 +09:00