`get_tools_for_query` force-includes whole tool families when the query
mentions an intent keyword, but matched with a raw substring test
(`kw in ql`). Short hints therefore fired inside unrelated words, bloating
the tool set with irrelevant tools:
- "fix" matched "prefix" -> document tools
- "line" matched "deadline"/"online" -> document tools
- "serve" matched "observe"/"reserve" -> cookbook serve tools
- "reply" matched "replying" -> all email tools
- "unread" matched "unreadable" -> all email tools
Match each keyword on word boundaries instead
(`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to
the keyword matcher in topic_analyzer.py. Genuine intent keywords
("reply to this email", "edit the document", "serve the model") still match.
This only removes substring-inside-a-word matches; it does not change whole
-word matches (so e.g. an unrelated whole word like "tell" is a separate
keyword-choice question, left untouched here).
Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed;
3 of them fail on the pre-fix substring code), python -m py_compile
src/tool_index.py, git diff --check.
PresetManager.load already heals a forward-incompatible presets.json: the
block just above repairs the legacy `custom` shape and re-saves the file.
But if the file exists and is missing a whole built-in preset (e.g. an older
install written before `reason` existed), load returned it as-is, so that
built-in stayed permanently absent — silently missing from the picker that
GET /api/presets feeds, with no way for the user to get it back.
Extend the same self-heal: after the legacy migration, fill in any built-in
presets the loaded file is missing, defaults-first so user edits win, and
persist the result. This never clobbers an intentional removal — there is no
delete path for the built-in keys (only user_templates entries can be
deleted), and presets are hidden via an `enabled: False` flag, not removal.
Checks: python -m pytest tests/test_preset_fill_missing_defaults.py (3 passed;
2 fail on the pre-fix code), the existing preset cases in
tests/test_review_regressions.py still pass, python -m py_compile
src/preset_manager.py, git diff --check.
The sync-chat endpoint's Case 3 fallback selected a ModelEndpoint with an
unscoped `query(ModelEndpoint).filter(is_enabled == True).first()` and then
used that row's decrypted `api_key` for the LLM call. ModelEndpoint is a
per-user resource (owner non-null = private to that user), so a chat-scoped
API token for user A that sent no session and no api_key could fall back onto
user B's PRIVATE endpoint — spending B's API key/quota and reaching whatever
internal base_url B configured. This is the same multi-tenant owner-scoping
class already fixed for the session gate on this very endpoint
(_caller_owns_session) and for companion/models.
Scope the fallback to the token owner's own rows plus legacy null-owner
(shared) rows via the existing owner_filter helper, matching
routes/model_routes.py and companion/routes.py. A null/empty owner stays a
no-op, preserving single-user/legacy behaviour.
Add regression tests pinning the scoped fallback (cross-owner, shared-only,
no-visible-row, disabled-owned, and the legacy null-owner no-op).
The DELETE /api/skills/{skill_id} handler resolves the caller, loads the
skill with skills_manager.load(owner=user), and verifies ownership with
_verify_owner(match, user) — but then calls
skills_manager.delete_skill(match.get("name")) without the owner.
SkillsManager.delete_skill filters candidates with
`(sk.owner or "") != (owner or "")`, so when owner is None an owner-scoped
skill is skipped and the method returns False. The route then raises a
spurious 404 "Skill not found" — meaning a logged-in user can never delete
their own skills through the API.
Pass the resolved owner through to delete_skill so the skill is matched and
removed.
tests/test_skills_delete_owner.py drops a real owner-scoped SKILL.md on disk
and (1) checks the manager directly: delete_skill without owner returns
False (regression lock) while delete_skill(owner="alice") returns True and
removes the dir; (2) drives the real DELETE route handler and asserts it
returns {"ok": True} and deletes the file. The route test fails before this
change (404). Real SkillsManager + real filesystem, no mocking.
APIKeyManager.load() decrypts every stored key with a dict comprehension
and no error handling. If the .key file no longer matches the ciphertext in
api_keys.json — key rotated, a partial/!mismatched data restore, or a
corrupted .key — Fernet.decrypt raises cryptography.fernet.InvalidToken.
app_initializer.py calls api_key_manager.load() during startup, so a single
undecryptable entry takes down the whole app at boot, and the user can't
reach the UI to fix it.
Decrypt each key in a loop and, on InvalidToken/ValueError, log a warning
and skip that one entry while still returning every key that decrypts
cleanly. One bad/stale key no longer blocks startup.
tests/test_api_key_manager_resilience.py saves a valid key, then injects an
entry encrypted under a different Fernet key (InvalidToken) and a malformed
token (ValueError), and asserts load() returns the good key and skips the
bad ones without raising. Fails before this change.
The webhook URL guard's _ip_is_private() only checks a hardcoded
_PRIVATE_NETWORKS list, which misses several addresses that route
internally. validate_webhook_url() therefore ALLOWED:
- http://[::]/ (IPv6 unspecified, reaches localhost)
- http://[::ffff:127.0.0.1]/ (IPv4-mapped IPv6 loopback = 127.0.0.1)
- http://[::ffff:169.254.169.254]/ (IPv4-mapped cloud metadata endpoint)
The last one is the dangerous case: a webhook pointed at the mapped
169.254.169.254 can pull cloud instance credentials (SSRF -> credential
theft).
Harden _ip_is_private(): first unwrap IPv4-mapped IPv6 to its embedded IPv4
(addr.ipv4_mapped), then reject via the stdlib address properties
(is_private, is_loopback, is_link_local, is_reserved, is_multicast,
is_unspecified) in addition to the existing network list. Public addresses
still pass.
tests/test_webhook_ssrf_resilience.py asserts validate_webhook_url raises
for the three IPv6 bypasses plus 127.0.0.1 and 0.0.0.0, and still accepts a
public IP literal. The IPv6 cases fail before this change.
_build_ollama_payload sends options.temperature and options.num_predict
to /api/chat, but never options.num_ctx. Ollama defaults num_ctx to 2048
when the option is omitted, so prompts going to any Ollama backend are
silently truncated there regardless of the model's actual capability.
Thread the discovered context length through the three call sites
(llm_call, llm_call_async, stream_llm) and emit options.num_ctx when it
is known and positive. The builder filters out the DEFAULT_CONTEXT
fallback (128000) so we don't lie to Ollama about models whose window
we couldn't actually discover. The issue's literal 'when > 2048'
heuristic is dropped: a model with a real context smaller than 2048
would OOM if Ollama used its default, so we pass the real value
regardless of size. Matches how src/context_compactor.py uses the
same helper.
Sister fix to PR #753 — that PR teaches the compactor the right budget,
this one tells Ollama to actually use that budget on the way in.
tool_execution.py returns web search results as {"output": ..., "exit_code": 0}.
The sources-extraction block in stream_agent_loop only checked result.get("results")
and result.get("stdout"), so _src_text was always "" for every tool-call-mode web
search. Two consequences:
1. The SOURCES marker was never parsed and the web_sources SSE event was never
emitted -- the sources panel never appeared after agent-mode searches.
2. The marker (a large JSON blob) was left in result["output"] and forwarded
verbatim to the LLM in round 2 via format_tool_result, confusing some local
models into producing no tokens.
Fix: prepend result.get("output") to the lookup chain, and update the cleanup
assignment so result["output"] is overwritten with the stripped text.
Adds six regression tests in tests/test_agent_loop.py documenting the before/after
behaviour and verifying backward compat with the legacy results/stdout paths.
Co-authored-by: MohammadYusif <MohammadYusif@users.noreply.github.com>
* Fix Cookbook dependency install completion state
Mark Cookbook dependency installs as complete when the background runner
exits successfully, even when HuggingFace-specific download markers are
absent.
* Add focused regression coverage for cookbook dependency completion.
Keep the fix narrowly scoped while carrying env_path through dependency tasks and locking the completion reconciliation behavior with targeted tests.
Models (notably Gemini) emit a native 'google_search' function call, but the
agent loop had no mapping for it, so the call failed to convert, the round
produced 0 chars and 0 tool blocks, and generation died silently — the web
client hung on 'waiting for first token' with no error (also #443).
- Map google_search / google_search_retrieval / google_search_grounding to the
web_search tool, and read Gemini's 'queries' array (falling back to 'query').
- In stream_agent_loop, when a round yields no response text and no tool
events, emit a visible fallback message instead of leaving the user hanging.
- Give the unknown-tool execution branch an explicit exit_code=1 so the failure
is logged as an error rather than 'n/a'.
Unknown/unconvertible tool names still return None (unchanged) so they are
dropped safely rather than executed. Added tests covering the google_search
mapping, the queries array, and unknown/invalid-JSON returning None.
The existing test_endpoint_resolver.py copies the pure functions to avoid
import side effects, so its assertions can silently drift from the shipped
src/endpoint_resolver.py (the copies already lag: no OpenRouter headers, no
anthropic.com host matching). This adds a sibling module that imports the
REAL resolver and locks in behavior for every provider named in ROADMAP.md's
"Provider setup/probing audit" — Anthropic, Gemini, Groq, xAI, OpenRouter,
OpenAI, DeepSeek — plus Ollama (local + cloud) and the Tailscale self-host
fallback in resolve_url.
Covers build_chat_url, build_models_url, build_headers, normalize_base,
_first_chat_model, _anthropic_api_root, _ollama_api_root, and resolve_url.
conftest.py already stubs the heavy deps, so the import is side-effect free.
Test-only; no behavior change. 55 new tests, all passing.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Split 3/4 of the companion bridge (#863, #871 landed 1/4 and 2/4). Adds admin-only
device pairing to the companion router.
- GET /api/companion/pair -- renders a form; never mints (a GET must not mint a
credential: SameSite=Lax session cookies ride top-level GET navigations, so
GET-minting would be CSRF-triggerable via a link/<img>)
- POST /api/companion/pair -- mints a one-time chat-scoped token. Admin-cookie
only; CSRF-safe because a SameSite=Lax cookie is not sent on a cross-site POST,
the same protection POST /api/tokens relies on. ?format=json returns the
pairing payload for an in-app screen.
Minting invalidates the auth middleware's token cache so the code works on the
next request with no restart. companion/pairing.py holds the mint/LAN/QR helpers;
the token is shown once and stored only as a bcrypt hash + prefix
(mirrors routes/api_token_routes.py).
Tests (tests/test_companion_pairing.py):
- a bearer/'api' caller and a non-admin user are rejected by require_admin (403);
an admin passes
- the token is returned once and persisted only as a hash
- minting invalidates the cache (works without restart)
- minting is exposed on POST, never GET (CSRF)
Screen readers got no signal that a dialog opened — not one modal carried
role="dialog" — and several close buttons had no accessible name.
- The 6 static tool windows (Brain, Theme, Prompt, Rename session, Cookbook,
Settings) now carry role="dialog" + an accessible name. They are dockable,
tiling windows, so they are non-modal dialogs (intentionally no aria-modal).
- The four unlabelled close buttons (theme, prompt, cookbook, settings) get an
aria-label so they no longer read as just "heavy multiplication x".
- styledConfirm / styledPrompt ARE blocking modals: they get role="dialog" +
aria-modal="true" + aria-labelledby/aria-describedby, and now manage focus —
restore focus to the triggering element on close and trap Tab within the
dialog (they already moved focus in on open).
tests/test_dialog_aria.py pins the roles, labels, and focus management.
* Cookbook: Engine filter + intelligent hardware-computed serve profiles
Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).
Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
picker. Pure client-side view filter over the fetched list via the same
_detectBackend() the serve commands use, so what you filter to is exactly what
would launch. Re-renders from cache (no refetch). Empty-state message + the
instant-cache-paint path account for it too.
Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
instead of failing; a model that fits stays fully on GPU; quant tracks profile
intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
— partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
--cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
Cache / Flash Attn fields. Context is clamped to the model's trained limit
(and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
amdgpu ErrorDeviceLost.
Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Cookbook: make column-header sorting discoverable (incl. Newest)
Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.
- Style sortable headers as interactive: pointer cursor, hover underline, and
the active sort column bolded/highlighted. There was no CSS for
.hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
existing header-click sort wiring and the "newest" SORT_KEY.
No new sort control — uses the existing column-header paradigm.
Checks: node --check passes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)
In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.
- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
mode), the quant is locked to the file's and profiles differ only in the real
serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
(no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
the repo/file name) and passes them, so profiles match what's actually served.
Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.
Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor
Two serve-panel additions:
1. **Vision toggle.** A "Vision" checkbox that serves the model with its
multimodal projector so it can read images. The mmproj path is resolved at
runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
the model folder makes the toggle just work; `--mmproj … --image-max-tokens
1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.
2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
while the panel is open and shows VRAM used/total/%, free, and — crucially on
a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
(previously read for total only and discarded for 'used').
Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.
Checks: node --check + py_compile pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The spinoff endpoint authenticated the caller (_require_user) but never
verified the research session belonged to them before reading the
persisted report and seeding it into a new chat session owned by the
caller. Any authenticated user who knew or guessed another user's
research session ID could exfiltrate that user's full report into their
own session — a cross-user data disclosure (IDOR).
Every other endpoint in this router gates on _owns_in_memory /
_assert_owns_research right after validating the session ID; spinoff was
the lone exception. Add the same _owns_in_memory check (covers both the
in-memory task and the on-disk JSON) so a non-owner gets a 404 before any
data is read or a session is created.
Add regression tests pinning the anonymous (401) and wrong-owner (404)
cases.
_resolve_ddg_redirect (the DuckDuckGo /l/?uddg= redirect resolver used on every
HTML-fallback result href) gated on `"duckduckgo.com" in parsed.hostname`. That
substring test also matches look-alike hosts like `duckduckgo.com.evil.com` and
`notduckduckgo.com`, so a result link on such a host would be silently rewritten
to its embedded `uddg` target. Same substring-vs-hostname pitfall fixed for
provider detection in 54ecfa3.
Match the host properly: exactly `duckduckgo.com` or a `.duckduckgo.com`
subdomain. Genuine redirects (`//duckduckgo.com/l/...`, and relative `/l/...`
hrefs resolved against `html.duckduckgo.com`) keep working.
The resolver was a closure inside duckduckgo_search; lifted it (plus the new
_is_duckduckgo_host helper) to module scope so it can be unit-tested directly.
Adds tests/test_ddg_redirect_resolution.py (red on the look-alike case before
this change, green after).
The /api/vault/unlock handler ran `bw` as
`_run_bw(["unlock", req.master_password, "--raw"])`. _run_bw launches it with
`asyncio.create_subprocess_exec(bw_path, *args)`, so the master password became
a process argument — readable by any local user through `ps` and
`/proc/<pid>/cmdline` for the lifetime of the unlock subprocess. The Bitwarden
master password decrypts the entire vault, so this is a serious credential
exposure on any multi-user / shared host (CWE-214).
The sibling /login handler already avoids this by feeding the password on
stdin; unlock was the outlier. Hand the password to `bw` through the
environment instead (`--passwordenv BW_PASSWORD`), mirroring how BW_SESSION is
already passed — `/proc/<pid>/environ` is readable only by the process owner,
not other local users. Add regression tests pinning that the secret reaches
the subprocess env and never appears in argv.
The dependency-install fallback chain unconditionally ran
'pip install --user', which fails inside a virtualenv (and as root in
LXC/containers) with 'Can not perform a --user install. User site-packages
are not visible in this virtualenv.' — even though the function's docstring
already noted --user is invalid in venvs.
Guard the --user fallback with a venv check so it only runs outside a venv
(where --user is actually valid for PEP-668 system Pythons). Derive the venv
probe interpreter from the install command (python for 'pip', python3 for
'pip3'/'python3 -m pip') so the check runs in pip's own environment. System
PEP-668 installs keep the --user fallback; venv/LXC-root installs no longer
hit the --user error. Updated the unit test for the new chain.
Closes#388
Add a hashchange handler for #document-<id> so refresh / URL-bar nav opens the document, and replace the silent console.error in loadDocument with a user-facing toast.
Closes#560
* fix(stream): read 'reasoning' SSE field for vLLM 0.20.2 / NIM
vLLM 0.20.2 / NVIDIA NIM emit reasoning-parser output in the `reasoning` delta field; older builds use `reasoning_content`. stream_llm() read only the latter, so reasoning from models like Nemotron-3-Nano (--reasoning-parser) was silently dropped and never rendered. Accept either field.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): keep reasoning_content only on the latest assistant turn
The agent loop echoed each round's reasoning back as `reasoning_content` on every assistant turn, assuming vendors ignore it. Nemotron's chat template re-injects ALL prior reasoning_content as <think> blocks, and the loop is trimmed only once (before it starts) — so reasoning accumulated unbounded across rounds, bloating context and feeding the model its own prior reasoning, which reinforced repetition/looping. Strip reasoning_content from earlier assistant turns so only the most recent round carries it (still satisfies DeepSeek's thinking-mode follow-up requirement).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent-ui): wrap each round's reasoning in its own <think> block
The streamed think-tag wrapper gated on whole-message substring checks (accumulated.includes('<think>')), which only ever wrapped ONE reasoning block per message. A multi-round agent response has a reasoning phase per round, so once round 1 closed its <think>...</think>, rounds 2+ reasoning was emitted unwrapped and leaked into the visible answer. Replace the substring checks with a stateful open/close flag that toggles per think/answer cycle, so each round's reasoning gets its own collapsible block. Single-turn chat is unchanged (one open, one close).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(stream): reasoning/reasoning_content delta surfaces as thinking chunk
Covers @pewdiepie-archdaemon's requested regression: a streamed {reasoning: ...} delta emits a thinking chunk while {content: ...} streams as normal content; plus the older reasoning_content field for backward compat. Mirrors the #591 scenario.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(cost): treat dotless container hostnames as local (free)
getModelCost() substring-matches model names against a cloud price table, so a self-hosted 'nemotron'/'llama' model was billed at cloud rates. isLocalEndpoint() only recognized IPs / localhost / .local, not bare Docker service names (nim-nano, llamaswap), so the local-is-free guard missed them. A single-label hostname (no dot) can never be a public API -> treat as local.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(cost): isLocalEndpoint classifies service names local, cloud FQDNs billable
Covers @pewdiepie-archdaemon's requested cases: llamaswap/nim-nano + localhost/private-IPs/.local => local (free); api.openai.com/openrouter.ai/etc => not local. Drives the real function via node --input-type=module (same approach as test_reply_recipients_js.py), skips when node is absent.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fallback memory extractor (used by routes/memory_routes.py when the LLM
extractor fails) matched list items with `r'^[-*•]|\d+\.\s*(.*)'`. Operator
precedence makes that `(^[-*•]) | (\d+\.\s*(.*))`, so the capture group only
exists on the numbered-list branch.
A bullet line ("- foo") matches the first branch, so `group(1)` is None and
`text_match.group(1).strip()` raises AttributeError — crashing extraction for
any assistant message that contains a bullet list (i.e. most of them). Numbered
lists happened to work.
Group both markers — `r'^(?:[-*•]|\d+\.)\s*(.*)'` — so the capture applies to
bullets and numbers alike.
Adds tests/test_memory_bullet_extraction.py (red before, green after).
TaskScheduler.start() aborts stale TaskRun rows but never advanced
ScheduledTask.next_run. Across a restart the in-process _executing set
is empty, so the first post-restart _check_due_tasks() call dispatches
every task whose next_run is still in the past — and so does every
subsequent poll, until the task's regular _execute_task path finally
runs compute_next_run and pushes it forward.
start() now queries active tasks with next_run < now and pushes each
one to now + 60s. The first poll after restart sees them as not-yet-due,
the task runs once normally, and compute_next_run puts the schedule
back on its real cadence. Paused and not-yet-due tasks are left alone.
The validator test was rewritten as a regression test asserting the
opposite of the bug it originally demonstrated, plus two narrower cases
to lock down the filter (only active+overdue is touched).
* feat: publish all configured email addresses for reply-all exclusion
* fix: exclude all of the user's own addresses from reply-all, not just the active one
* test: reply-all excludes all of the user's configured addresses
* fix: match topic keywords on word boundaries, not substrings
* fix: apply word-boundary matching to topic example snippets too
* test: topic keywords match whole words, not substrings
The agent's multi-round (tool-result) follow-up request was rejected with
HTTP 400 on two providers, so tools ran but the agent never produced an answer:
- OpenAI-compatible streaming (Gemini 3) dropped the per-call thought_signature
and collided parallel tool calls, which arrive with index=None: they all
landed in slot 0, overwriting the first call's name and corrupting its
arguments by concatenation, so the follow-up request 400'd. Capture and replay
each call's extra_content (thought_signature), and give every parallel call
its own accumulator slot (allocated above the max key, so sparse or mixed
indices can't collide).
- Native Ollama /api/chat expects object tool-call arguments, but Odysseus
carries them as a JSON string, which Ollama rejected ("Value looks like
object, but can't find closing '}' symbol"). Convert them to objects in the
Ollama payload builder.
Both compose with the no-prose null-content sanitize fix from #862.
Tested: python -m pytest tests/test_llm_core_streaming.py
tests/test_llm_core_ollama.py tests/test_agent_loop.py (53 pass), and
python -m py_compile src/llm_core.py src/agent_loop.py.
Split 2/4 of the companion bridge (#863 was 1/4). A paired bearer-token caller
runs as the sandboxed 'api' pseudo-user, so its sessions were stranded in a
separate 'api'-owned silo, invisible to the owner's desktop UI.
Add effective_user(): for a bearer token it resolves to the token's real owner
(request.state.api_token_owner); for cookie sessions it is identical to
get_current_user, so the swap is a no-op for browser users. Route session
ownership/attribution in routes/session_routes.py through it.
Tests (tests/test_session_owner_attribution.py):
- cookie/browser users are unchanged
- a bearer token attributes to its owner; with no owner it does NOT escalate
- _verify_session_owner: a bearer token for owner A cannot verify owner B's
session (404); owner verifies their own; missing -> 404; unauth -> 403
POST /api/v1/chat (the n8n/Make/Activepieces sync-chat endpoint) verified
session ownership with `_tok_user and _sess_owner and _sess_owner != _tok_user`.
The `_sess_owner and` clause skipped the check entirely whenever the session's
owner was null — so any chat-scoped API token (e.g. a token minted for a paired
mobile device) could pass a legacy/migrated null-owner session id, inject a
message into that session, and read back its conversation history plus reuse
the owner's endpoint credentials.
This is the same `if owner and owner != user` null-owner-bypass pattern that
was already hardened in the gallery, calendar, and notes routes (see
test_null_owner_gates.py) and in session_routes._verify_session_owner. Make
this gate strict and fail closed too: require a resolvable caller and an exact
owner match, mirroring _verify_session_owner. Extract the decision into
_caller_owns_session() and pin it with regression tests.
When the selected model fails before producing output, stream_llm_with_fallback
quietly switches to the next candidate and the reply is shown under the
originally selected model's name, so a misconfigured provider looks like it
works. (Concretely: a Bedrock gateway that 400s every Anthropic/Claude request
appears fine because another model silently answers under the Claude label.)
Emit a `fallback` SSE event ({selected_model, answered_by, reason}) the first
time a non-primary candidate produces output, forward it through the agent loop
and both chat-route paths, stamp the response metrics with the model that
actually answered, and show a notice + relabel the reply in the UI.
Tested: python -m pytest tests/test_llm_core_fallback.py (3 pass);
python -m py_compile src/llm_core.py src/agent_loop.py routes/chat_routes.py;
node --check static/js/chat.js.