Gemma 4 and Phi-4 multimodal are natively vision-capable but their Ollama
tags ("gemma4:12b", "phi-4", "phi4") did not match any keyword in
_VISION_MODEL_KEYWORDS. The image was silently routed to the VL fallback
path instead of being passed directly to the model — users saw the model
respond to a placeholder like "[VL model unavailable - image not analyzed]"
rather than the actual image.
Adds "gemma-4"/"gemma4" and "phi-4"/"phi4" to the keyword list, following
the existing err-toward-True policy (#124): a text-only variant being
treated as vision is the safer failure than dropping a real image.
Fixes#1274 (partial — covers the Gemma 4 + Phi-4 case; the OpenRouter/free
vision fallback path is a separate issue).
The fallback helpers (llm_call_with_fallback, llm_call_async_with_fallback,
stream_llm_with_fallback) build their candidate list as the primary target
followed by the configured fallbacks. Callers prepend the session's live
(url, model) to default_model_fallbacks, so if the user also lists their current
model among the fallbacks — a common misconfiguration — the chain re-attempts
the very route that just failed: a wasted round-trip (and, for the streaming
path, a spurious 'fallback' notice for a switch that didn't actually happen).
Add a small _dedupe_candidates() helper that filters malformed entries and drops
a later repeat of an already-seen (url, model), preserving order (first wins,
keeping its headers). Apply it in all three fallback chains.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The agent tool-RAG force-includes a keyword hint's tools whenever any of its
keywords appears in the query (word-boundary match). The email-intent hint listed
"tell", which matches a huge fraction of requests — e.g. "visit <url> and tell
me the title" — so the whole email toolset was force-included and crowded out the
relevant tools. The model then saw a prompt dominated by email tools and reported
it had no web search / could not visit the URL.
Remove "tell" from the email keyword set. Genuine email intent still fires on
email/mail/gmail/inbox/unread/message/send/reply.
Test drives get_tools_for_query directly with retrieval stubbed (the keyword
hints are deterministic, no embeddings needed): a "...tell me..." web query no
longer pulls in email tools, a real email request still does.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VectorRAG.search() filters with ChromaDB where={"owner": owner}, returning only
documents whose owner equals the requesting user. The keyword fallback
(_keyword_search_fallback, used when the primary query raises) guarded with
`if doc_owner and doc_owner != owner: continue`, so a document with a
missing/empty owner fell through and was returned to whichever user issued the
query — a cross-user information leak on the fallback path.
Match the primary path's strict filter: skip any doc whose owner != the
requested owner, including owner-less docs.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
lstrip("\n[PDF content]:") treats the argument as a character set,
not a prefix, so it chews into the following [Page N text]: marker —
e.g. turning [Page 1 text]: into "age 1 text]:". The correct helper
strip_pdf_content_marker (which uses removeprefix) already exists in
the same file and is used by other call sites.
Fixes#1663
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removing one RAG directory destroyed the whole shared ChromaDB collection
(all owners + base index) instead of just that directory's chunks. Shared
root cause: PersonalDocsManager.remove_directory called rebuild_index()
(delete_collection + recreate) then re-indexed only the remaining tracked
dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory
that should have been used was itself broken (where={"source":{"$contains":dir}}
selects nothing on scalar metadata and would over-delete siblings), and the
dead do_manage_rag path fired a second unconditional rebuild.
- VectorRAG.remove_directory: select chunks in Python by a path-boundary match
on the stored absolute `source` (dir or dir+os.sep), abspath-normalized.
Keys on `source` (always written), never `owner` -- no migration.
- PersonalDocsManager.remove_directory: call the targeted remove instead of
rebuild_index() + partial reindex.
- do_manage_rag (dead code): drop the second rebuild_index() (hygiene).
- rag_server.py add path: abspath so indexed `source` matches the remove.
No schema change. Prevents future wipes (does not recover already-wiped
vectors). Adds hermetic regression tests at three layers.
Fixes#1660
Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
Anthropic's Messages API rejects temperature > 1.0 with HTTP 400, but
_build_anthropic_payload forwarded it verbatim. The shipped "Nietzsche" preset
uses temperature 1.2 and the UI slider allows up to 2.0, so every Claude request
under such a preset hard-broke. Clamp into [0.0, 1.0] in the Anthropic builder
only (OpenAI keeps its wider 0.0-2.0 range). Covers all three Anthropic call
paths, which build through this one function. None is passed through unchanged.
Fixes#1615
Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
context_compactor.maybe_compact built its summary text with
msg.get('content', '')[:2000], which raised
TypeError: 'NoneType' object is not subscriptable on assistant turns
whose content is None (turns that carried only native tool_calls).
Once a conversation crossed the 85% compaction threshold — reached
after only a few turns on small-context local models plus the large
agent prompt — every subsequent message failed ("send more than three
messages and it stops working").
Flatten message content to text first via a _content_as_text helper
(str passthrough, multimodal list blocks joined, None -> "") and
tolerate a missing role. Adds tests/test_context_compactor.py covering
the helper and a >=4-message conversation that forces compaction with
a None-content tool-call turn (fails before this change, passes after).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
send_to_session was the only agent tool that didn't check session
ownership — an agent acting for user A could read from and write
into user B's session on a multi-user instance.
Add owner parameter and reject access when the target session
belongs to a different user, matching the pattern used by
create_session, list_sessions, and manage_session.
Fixes#1616
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models like gemma4, qwen3.5, and ministral served via Ollama's native
/api/chat respond to OpenAI-style tool schemas by emitting a single
native tool_call chunk and then stopping. The agent loop receives
1 token of round_response and no recognised ToolBlock, so the round
ends immediately — the user sees a one-token response.
Root cause: _is_api_model was True for any endpoint whose host appears
in _API_HOSTS (which includes "host.docker.internal" and "localhost")
OR whose model name matches a keyword like "gemma". Native Ollama
endpoints were never excluded from this path.
Fix: import _is_ollama_native_url from llm_core and treat native Ollama
endpoints (/api/chat, port 11434) as text-only by default — falling back
to the fenced-block tool path the local models are tuned for. The
per-endpoint supports_tools=True toggle (Settings → Endpoints) still
overrides this for users who have explicitly opted in.
Fixes#1567
_generate_doc_id hashed only text. add_document / add_documents_batch
early-return when the id exists, so the second owner indexing a
byte-identical chunk hit the first owner's id, was silently dropped,
and never stored under their owner — their owner-filtered search then
quietly omitted it. Hash owner + text; empty owner reproduces the
legacy id, so the unowned/base index keeps existing ids and isn't
re-churned. Same-owner identical chunks still dedupe.
Caught by #1738 and #1760 (independent reports of the same bug).
writeback_event read cfg["password"] (the encrypted blob) and passed it
straight to DAVClient, so every local create/edit/delete authenticated
with the literal ciphertext, the remote rejected it, and the change
never reached the server — the exact silent-write-loss this module was
built to prevent. The pull path src/caldav_sync.py already decrypts;
mirror that. decrypt() is a no-op on legacy plaintext.
Caught by #1731.
The blocklist prefixes had trailing slashes, so path.startswith() only
matched /api/tokens/{id} but not /api/tokens itself — the bare GET (list)
and POST (mint) endpoints were reachable via app_api. Same gap on
/api/users (list/create/delete). Drop trailing slashes so both bare and
sub-resource forms are blocked. /api/auth and /api/admin had no bare
endpoints today but get the same treatment to prevent future drift.
Caught by #1462.
Two problems made deep research report "No information could be gathered" even
after it had extracted findings, on slow local models (reporter served a 20B
via LM Studio):
- _synthesize hard-capped its LLM call at timeout=60, while extraction uses the
user's extraction_timeout (300s here) and the final report uses 180s. The slow
model needed >60s to synthesize the round's findings, so synthesis timed out
after 3 attempts. Raised it to 180s to match the final-report call.
- When synthesis produced no report (it returns the unchanged, still-empty
report on failure during round 1), the run hit
`if not report: return "No information could be gathered…"` and discarded the
findings it had already gathered. Now it falls back to a compiled report built
from those findings (_fallback_report) so the user keeps the gathered material.
Tests stub the LLM (no live model/DB), pin the synthesis timeout >= 180, that the
fallback surfaces the findings rather than the give-up message, and that a failed
synthesis preserves the previous report.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
is_vision_model() classified several genuinely multimodal families as text-only
because their names contain neither "vision" nor "vl": Gemma 3 (4b+), Llama 4,
Mistral Small 3.1/3.2, and *-multimodal models (e.g. phi-4-multimodal). For those
the attached image was stripped before the request, so the model never saw it —
a "can't read the image" report (issue #1274), common with Ollama tags like
gemma3:4b.
Add those keywords (plus a generic "multimodal"). Per the file's err-toward-True
policy (#124), a rare text-only tag treated as vision is the safer failure than
dropping a real image. Guard tests confirm the text-only siblings (gemma2, plain
gemma, mistral-small, phi-3) are not over-matched.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Stop multi-file uploads from tripping the per-IP concurrency guard
The /api/upload concurrency check summed its condition over `files`, but the
condition didn't reference the loop variable — so it collapsed to len(files)
whenever the IP had any recent upload. A single multi-file batch sent right
after another upload therefore counted itself as N concurrent uploads and hit
max_concurrent_uploads (3), returning 429. The browser swallows the 429 (no
`files` in the body) and sends the chat with no attachments, so the model
"doesn't even see" them (issue #1346).
Count genuine recent upload events instead, via a pure count_recent_uploads()
helper, independent of the current batch's file count. save_upload still
enforces the per-minute sliding-window rate limit per file, so throttling is
preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Also reconcile the per-minute upload rate limit with the batch cap
Follow-up within #1346: even after the concurrency-guard fix, a 6+ file batch
still failed because save_upload() counts each file against upload_rate_limit
(was 5/min) while the composer allows MAX_FILES=10 per batch — the reporter saw
"5 attachments work, 6 fail". Raise the per-minute file cap to 60 so a single
full batch (and a few of them) isn't self-rejected; burst abuse stays bounded by
max_concurrent_uploads. Add a real 6-file regression + a config guard that the
cap exceeds the frontend MAX_FILES.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>