The fallback helpers (llm_call_with_fallback, llm_call_async_with_fallback,
stream_llm_with_fallback) build their candidate list as the primary target
followed by the configured fallbacks. Callers prepend the session's live
(url, model) to default_model_fallbacks, so if the user also lists their current
model among the fallbacks — a common misconfiguration — the chain re-attempts
the very route that just failed: a wasted round-trip (and, for the streaming
path, a spurious 'fallback' notice for a switch that didn't actually happen).
Add a small _dedupe_candidates() helper that filters malformed entries and drops
a later repeat of an already-seen (url, model), preserving order (first wins,
keeping its headers). Apply it in all three fallback chains.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The agent tool-RAG force-includes a keyword hint's tools whenever any of its
keywords appears in the query (word-boundary match). The email-intent hint listed
"tell", which matches a huge fraction of requests — e.g. "visit <url> and tell
me the title" — so the whole email toolset was force-included and crowded out the
relevant tools. The model then saw a prompt dominated by email tools and reported
it had no web search / could not visit the URL.
Remove "tell" from the email keyword set. Genuine email intent still fires on
email/mail/gmail/inbox/unread/message/send/reply.
Test drives get_tools_for_query directly with retrieval stubbed (the keyword
hints are deterministic, no embeddings needed): a "...tell me..." web query no
longer pulls in email tools, a real email request still does.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
VectorRAG.search() filters with ChromaDB where={"owner": owner}, returning only
documents whose owner equals the requesting user. The keyword fallback
(_keyword_search_fallback, used when the primary query raises) guarded with
`if doc_owner and doc_owner != owner: continue`, so a document with a
missing/empty owner fell through and was returned to whichever user issued the
query — a cross-user information leak on the fallback path.
Match the primary path's strict filter: skip any doc whose owner != the
requested owner, including owner-less docs.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
lstrip("\n[PDF content]:") treats the argument as a character set,
not a prefix, so it chews into the following [Page N text]: marker —
e.g. turning [Page 1 text]: into "age 1 text]:". The correct helper
strip_pdf_content_marker (which uses removeprefix) already exists in
the same file and is used by other call sites.
Fixes#1663
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removing one RAG directory destroyed the whole shared ChromaDB collection
(all owners + base index) instead of just that directory's chunks. Shared
root cause: PersonalDocsManager.remove_directory called rebuild_index()
(delete_collection + recreate) then re-indexed only the remaining tracked
dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory
that should have been used was itself broken (where={"source":{"$contains":dir}}
selects nothing on scalar metadata and would over-delete siblings), and the
dead do_manage_rag path fired a second unconditional rebuild.
- VectorRAG.remove_directory: select chunks in Python by a path-boundary match
on the stored absolute `source` (dir or dir+os.sep), abspath-normalized.
Keys on `source` (always written), never `owner` -- no migration.
- PersonalDocsManager.remove_directory: call the targeted remove instead of
rebuild_index() + partial reindex.
- do_manage_rag (dead code): drop the second rebuild_index() (hygiene).
- rag_server.py add path: abspath so indexed `source` matches the remove.
No schema change. Prevents future wipes (does not recover already-wiped
vectors). Adds hermetic regression tests at three layers.
Fixes#1660
Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
Anthropic's Messages API rejects temperature > 1.0 with HTTP 400, but
_build_anthropic_payload forwarded it verbatim. The shipped "Nietzsche" preset
uses temperature 1.2 and the UI slider allows up to 2.0, so every Claude request
under such a preset hard-broke. Clamp into [0.0, 1.0] in the Anthropic builder
only (OpenAI keeps its wider 0.0-2.0 range). Covers all three Anthropic call
paths, which build through this one function. None is passed through unchanged.
Fixes#1615
Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
context_compactor.maybe_compact built its summary text with
msg.get('content', '')[:2000], which raised
TypeError: 'NoneType' object is not subscriptable on assistant turns
whose content is None (turns that carried only native tool_calls).
Once a conversation crossed the 85% compaction threshold — reached
after only a few turns on small-context local models plus the large
agent prompt — every subsequent message failed ("send more than three
messages and it stops working").
Flatten message content to text first via a _content_as_text helper
(str passthrough, multimodal list blocks joined, None -> "") and
tolerate a missing role. Adds tests/test_context_compactor.py covering
the helper and a >=4-message conversation that forces compaction with
a None-content tool-call turn (fails before this change, passes after).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
send_to_session was the only agent tool that didn't check session
ownership — an agent acting for user A could read from and write
into user B's session on a multi-user instance.
Add owner parameter and reject access when the target session
belongs to a different user, matching the pattern used by
create_session, list_sessions, and manage_session.
Fixes#1616
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Models like gemma4, qwen3.5, and ministral served via Ollama's native
/api/chat respond to OpenAI-style tool schemas by emitting a single
native tool_call chunk and then stopping. The agent loop receives
1 token of round_response and no recognised ToolBlock, so the round
ends immediately — the user sees a one-token response.
Root cause: _is_api_model was True for any endpoint whose host appears
in _API_HOSTS (which includes "host.docker.internal" and "localhost")
OR whose model name matches a keyword like "gemma". Native Ollama
endpoints were never excluded from this path.
Fix: import _is_ollama_native_url from llm_core and treat native Ollama
endpoints (/api/chat, port 11434) as text-only by default — falling back
to the fenced-block tool path the local models are tuned for. The
per-endpoint supports_tools=True toggle (Settings → Endpoints) still
overrides this for users who have explicitly opted in.
Fixes#1567
_generate_doc_id hashed only text. add_document / add_documents_batch
early-return when the id exists, so the second owner indexing a
byte-identical chunk hit the first owner's id, was silently dropped,
and never stored under their owner — their owner-filtered search then
quietly omitted it. Hash owner + text; empty owner reproduces the
legacy id, so the unowned/base index keeps existing ids and isn't
re-churned. Same-owner identical chunks still dedupe.
Caught by #1738 and #1760 (independent reports of the same bug).
writeback_event read cfg["password"] (the encrypted blob) and passed it
straight to DAVClient, so every local create/edit/delete authenticated
with the literal ciphertext, the remote rejected it, and the change
never reached the server — the exact silent-write-loss this module was
built to prevent. The pull path src/caldav_sync.py already decrypts;
mirror that. decrypt() is a no-op on legacy plaintext.
Caught by #1731.
The blocklist prefixes had trailing slashes, so path.startswith() only
matched /api/tokens/{id} but not /api/tokens itself — the bare GET (list)
and POST (mint) endpoints were reachable via app_api. Same gap on
/api/users (list/create/delete). Drop trailing slashes so both bare and
sub-resource forms are blocked. /api/auth and /api/admin had no bare
endpoints today but get the same treatment to prevent future drift.
Caught by #1462.
Two problems made deep research report "No information could be gathered" even
after it had extracted findings, on slow local models (reporter served a 20B
via LM Studio):
- _synthesize hard-capped its LLM call at timeout=60, while extraction uses the
user's extraction_timeout (300s here) and the final report uses 180s. The slow
model needed >60s to synthesize the round's findings, so synthesis timed out
after 3 attempts. Raised it to 180s to match the final-report call.
- When synthesis produced no report (it returns the unchanged, still-empty
report on failure during round 1), the run hit
`if not report: return "No information could be gathered…"` and discarded the
findings it had already gathered. Now it falls back to a compiled report built
from those findings (_fallback_report) so the user keeps the gathered material.
Tests stub the LLM (no live model/DB), pin the synthesis timeout >= 180, that the
fallback surfaces the findings rather than the give-up message, and that a failed
synthesis preserves the previous report.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
is_vision_model() classified several genuinely multimodal families as text-only
because their names contain neither "vision" nor "vl": Gemma 3 (4b+), Llama 4,
Mistral Small 3.1/3.2, and *-multimodal models (e.g. phi-4-multimodal). For those
the attached image was stripped before the request, so the model never saw it —
a "can't read the image" report (issue #1274), common with Ollama tags like
gemma3:4b.
Add those keywords (plus a generic "multimodal"). Per the file's err-toward-True
policy (#124), a rare text-only tag treated as vision is the safer failure than
dropping a real image. Guard tests confirm the text-only siblings (gemma2, plain
gemma, mistral-small, phi-3) are not over-matched.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Stop multi-file uploads from tripping the per-IP concurrency guard
The /api/upload concurrency check summed its condition over `files`, but the
condition didn't reference the loop variable — so it collapsed to len(files)
whenever the IP had any recent upload. A single multi-file batch sent right
after another upload therefore counted itself as N concurrent uploads and hit
max_concurrent_uploads (3), returning 429. The browser swallows the 429 (no
`files` in the body) and sends the chat with no attachments, so the model
"doesn't even see" them (issue #1346).
Count genuine recent upload events instead, via a pure count_recent_uploads()
helper, independent of the current batch's file count. save_upload still
enforces the per-minute sliding-window rate limit per file, so throttling is
preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Also reconcile the per-minute upload rate limit with the batch cap
Follow-up within #1346: even after the concurrency-guard fix, a 6+ file batch
still failed because save_upload() counts each file against upload_rate_limit
(was 5/min) while the composer allows MAX_FILES=10 per batch — the reporter saw
"5 attachments work, 6 fail". Raise the per-minute file cap to 60 so a single
full batch (and a few of them) isn't self-rejected; burst abuse stays bounded by
max_concurrent_uploads. Add a real 6-file regression + a config guard that the
cap exceeds the frontend MAX_FILES.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: MCP reconnect via tool passes only server_id to connect_server
connect_server requires name, transport, command, args, env, and url
but the reconnect path in do_manage_mcp only passed the server_id,
causing a TypeError on every reconnect attempt. Mirror the pattern
used in mcp_routes.py reconnect_server.
* test: verify MCP reconnect passes full server config to connect_server
Mocks the MCP manager and DB to assert that do_manage_mcp reconnect
passes name, transport, command, args, env, and url — not just the
server_id.
After a deep-research job completes, a follow-up like "check it out" / "read
that report" had the agent web_fetch the /api/research/report/{id} HTML render
(and then drift into unrelated searches) instead of reading the saved report
(issue #1363). The report text is already available via the manage_research
tool (action read), and action list returns ids most-recent-first, so the
agent can resolve "the recent report" itself.
Strengthen the manage_research instructions: read a finished report via
action list -> action read; do NOT web_fetch/app_api the report URL (it renders
HTML, not clean text) and do NOT start a fresh web_search just to read an
existing report. Annotate the app_api endpoint list to say the same.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a timezone is configured, `now` is tz-aware local time.
The comparison stripped tzinfo with `.replace(tzinfo=None)`,
producing naive local time, but `scheduled_date` is stored as
naive UTC. For users east of UTC this causes tasks to appear
expired prematurely; for users west they linger past due time.
Use `_to_utc_naive(now)` to convert to the same reference frame.
Deep research generated search queries from the LLM's training-cutoff
knowledge, so it emitted stale-year queries like "best Python tutorials
2025" when the actual year is later (issue #1341). The chat/agent path
already grounds the model with "Today is ..." (src/agent_loop.py); the
deep research planning and query-generation prompts had no equivalent.
Add a small current_date_context() helper and prepend it at the plan and
query-generation prompt sites (and the research_handler plan preview path
that reuses RESEARCH_PLAN_PROMPT). System-TZ local, portable strftime.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat: document rrule in the manage_calendar tool schema (#1320)
The create_event handler already persists `rrule` (a single event carrying an
iCalendar RRULE), but the manage_calendar tool schema didn't list it, so the
agent had no documented way to make a recurring event and took a roundabout
path. Add `rrule?` to the create_event field list with examples
(FREQ=WEEKLY;BYDAY=MO etc.) and an explicit note to create ONE event with the
rule rather than looping.
Covered by tests/test_calendar_rrule.py: do_manage_calendar create_event with an
rrule stores one event with that recurrence; without it, the event is single.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test: restore SessionLocal via monkeypatch in #1320 rrule test (review)
Per review: the test patched core.database.SessionLocal at module import and
never restored it, which could leak the temp DB into later tests in the same
process. Move the patch into an autouse monkeypatch fixture so it is restored
after each test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>