Commit Graph

200 Commits

Author SHA1 Message Date
Afonso Coutinho
33ae982968 fix: context_compactor token helpers crash on non-string message text (#1634)
* fix: context_compactor token helpers crash on non-string message text

* fix: _truncate_text_to_token_budget returns an empty string for non-string text, not the raw value
2026-06-03 14:12:14 +09:00
Shaw
63aa15d155 fix(scheduler): fail closed on malformed scheduled_time instead of 500 (#1410)
compute_next_run parsed scheduled_time as "HH:MM" with int(parts[0]),
int(parts[1]) and no validation, so "9", "9am", "25:00", "9:" or ":30" raised
IndexError/ValueError. The POST /tasks create route passes the user/LLM-supplied
scheduled_time before its try block (and only validates the cron field), so a
bad value surfaced as an unhandled 500 rather than the clean 400 used for other
invalid fields — and the same crash could fire inside the scheduler loop when
recomputing next_run for an already-stored bad row.

Guard the parse and fail closed (warn + return None), matching the existing
invalid-cron handling in the same function.

Adds tests/test_scheduler_scheduled_time_validation.py — malformed values return
None (fail before with IndexError/ValueError), valid HH:MM still computes.
2026-06-03 14:12:07 +09:00
red person
db8c0b3dac Ignore non-string background stream deltas (#1549) 2026-06-03 14:11:45 +09:00
red person
38bfa85ad0 Reject invalid Tailscale discovery JSON (#1556)
* Reject invalid Tailscale discovery JSON

* Guard nested Tailscale IP shapes
2026-06-03 14:11:31 +09:00
Afonso Coutinho
1453458519 fix: is_public_blocked_tool crashes on a truthy non-string tool name (#1620)
* fix: is_public_blocked_tool crashes on a truthy non-string tool name

* fix: is_public_blocked_tool fails closed (blocks) on a malformed non-string tool name
2026-06-03 14:11:14 +09:00
red person
d1309f3bd6 Ignore non-object settings scrub inputs (#1645) 2026-06-03 14:11:05 +09:00
red person
b409b20940 Handle non-string src search queries (#1646) 2026-06-03 14:11:02 +09:00
red person
558d6ddf24 Ignore invalid background job store rows (#1261) 2026-06-03 14:07:14 +09:00
red person
34efabdec8 Ignore invalid integration rows (#1404) 2026-06-03 14:07:11 +09:00
Afonso Coutinho
1571d8bba0 fix: agent_tools._truncate crashes on non-string input (#1624)
* fix: agent_tools._truncate crashes on non-string input

* fix: agent_tools._truncate returns a string for non-string input, not the raw value
2026-06-03 14:06:39 +09:00
Afonso Coutinho
3a741edbf1 fix: visual_report markdown helpers crash on a non-string input (#1633) 2026-06-03 14:06:35 +09:00
red person
8af1f85665 Ignore non-string email thread bodies (#1654) 2026-06-03 14:06:31 +09:00
Afonso Coutinho
28dbd5346c Treat non-string research summaries as low quality
Filter malformed non-string research summaries instead of letting the broad exception path classify them as usable, with regression coverage.
2026-06-03 13:42:24 +09:00
Afonso Coutinho
a880b17624 Skip malformed personal keyword index rows
Make personal keyword retrieval tolerate corrupted non-dict index entries and missing chunk lists, with regression coverage.
2026-06-03 13:42:05 +09:00
Afonso Coutinho
35b9509da3 fix: memory entry validation crashes on a non-dict row from memory.json (#1691) 2026-06-03 13:38:02 +09:00
Afonso Coutinho
f0b172020e fix: require_privilege 500s on a non-dict privileges blob from auth.json (#1693) 2026-06-03 13:37:54 +09:00
Afonso Coutinho
02ff2e3cb0 fix: updating a calendar event ignores user timezone and shifts the time (#1695) 2026-06-03 13:37:39 +09:00
Afonso Coutinho
19e62208d2 fix: streaming drops providers that emit SSE data lines with no space (#1701) 2026-06-03 13:37:14 +09:00
Afonso Coutinho
3da4edb442 fix: token usage dropped when it rides on a non-empty finish delta (#1703) 2026-06-03 13:36:57 +09:00
Lucas Daniel
578f56ab92 fix(vision): recognize Gemma 4 and Phi-4 as vision-capable models (#1704)
Gemma 4 and Phi-4 multimodal are natively vision-capable but their Ollama
tags ("gemma4:12b", "phi-4", "phi4") did not match any keyword in
_VISION_MODEL_KEYWORDS. The image was silently routed to the VL fallback
path instead of being passed directly to the model — users saw the model
respond to a placeholder like "[VL model unavailable - image not analyzed]"
rather than the actual image.

Adds "gemma-4"/"gemma4" and "phi-4"/"phi4" to the keyword list, following
the existing err-toward-True policy (#124): a text-only variant being
treated as vision is the safer failure than dropping a real image.

Fixes #1274 (partial — covers the Gemma 4 + Phi-4 case; the OpenRouter/free
vision fallback path is a separate issue).
2026-06-03 13:36:50 +09:00
Afonso Coutinho
f6f86c4b34 fix: research source extraction crashes on a non-dict finding (#1714) 2026-06-03 13:34:40 +09:00
lekt8
126e91e8b9 Don't attempt the same (url, model) route twice in the fallback chains (#1733)
The fallback helpers (llm_call_with_fallback, llm_call_async_with_fallback,
stream_llm_with_fallback) build their candidate list as the primary target
followed by the configured fallbacks. Callers prepend the session's live
(url, model) to default_model_fallbacks, so if the user also lists their current
model among the fallbacks — a common misconfiguration — the chain re-attempts
the very route that just failed: a wasted round-trip (and, for the streaming
path, a spurious 'fallback' notice for a switch that didn't actually happen).

Add a small _dedupe_candidates() helper that filters malformed entries and drops
a later repeat of an already-seen (url, model), preserving order (first wins,
keeping its headers). Apply it in all three fallback chains.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:33:50 +09:00
lekt8
77614e9feb Don't force-include the email toolset on every "tell me" query (#1707) (#1735)
The agent tool-RAG force-includes a keyword hint's tools whenever any of its
keywords appears in the query (word-boundary match). The email-intent hint listed
"tell", which matches a huge fraction of requests — e.g. "visit <url> and tell
me the title" — so the whole email toolset was force-included and crowded out the
relevant tools. The model then saw a prompt dominated by email tools and reported
it had no web search / could not visit the URL.

Remove "tell" from the email keyword set. Genuine email intent still fires on
email/mail/gmail/inbox/unread/message/send/reply.

Test drives get_tools_for_query directly with retrieval stubbed (the keyword
hints are deterministic, no embeddings needed): a "...tell me..." web query no
longer pulls in email tools, a real email request still does.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:33:43 +09:00
Mubashir R
a8a5d6f56e fix: RAG keyword fallback leaked owner-less documents across users (#1722)
VectorRAG.search() filters with ChromaDB where={"owner": owner}, returning only
documents whose owner equals the requesting user. The keyword fallback
(_keyword_search_fallback, used when the primary query raises) guarded with
`if doc_owner and doc_owner != owner: continue`, so a document with a
missing/empty owner fell through and was returned to whichever user issued the
query — a cross-user information leak on the fallback path.

Match the primary path's strict filter: skip any doc whose owner != the
requested owner, including owner-less docs.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:31:33 +09:00
Afonso Coutinho
ada30aa039 fix: evaluate_turn_regex crashes on a non-string agent_reply (#1723) 2026-06-03 13:31:26 +09:00
Wes Huber
49885ff9e7 fix(documents): use strip_pdf_content_marker instead of lstrip for PDF auto-open (#1727)
lstrip("\n[PDF content]:") treats the argument as a character set,
not a prefix, so it chews into the following [Page N text]: marker —
e.g. turning [Page 1 text]: into "age 1 text]:". The correct helper
strip_pdf_content_marker (which uses removeprefix) already exists in
the same file and is used by other call sites.

Fixes #1663

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 13:30:04 +09:00
Ethan
0e538ecd29 Fix RAG remove_directory wiping the entire shared collection (#1660) (#1734)
Removing one RAG directory destroyed the whole shared ChromaDB collection
(all owners + base index) instead of just that directory's chunks. Shared
root cause: PersonalDocsManager.remove_directory called rebuild_index()
(delete_collection + recreate) then re-indexed only the remaining tracked
dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory
that should have been used was itself broken (where={"source":{"$contains":dir}}
selects nothing on scalar metadata and would over-delete siblings), and the
dead do_manage_rag path fired a second unconditional rebuild.

- VectorRAG.remove_directory: select chunks in Python by a path-boundary match
  on the stored absolute `source` (dir or dir+os.sep), abspath-normalized.
  Keys on `source` (always written), never `owner` -- no migration.
- PersonalDocsManager.remove_directory: call the targeted remove instead of
  rebuild_index() + partial reindex.
- do_manage_rag (dead code): drop the second rebuild_index() (hygiene).
- rag_server.py add path: abspath so indexed `source` matches the remove.

No schema change. Prevents future wipes (does not recover already-wiped
vectors). Adds hermetic regression tests at three layers.

Fixes #1660

Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
2026-06-03 13:29:51 +09:00
Ethan
b9c382006e Clamp Anthropic temperature to [0.0, 1.0] in _build_anthropic_payload (#1737)
Anthropic's Messages API rejects temperature > 1.0 with HTTP 400, but
_build_anthropic_payload forwarded it verbatim. The shipped "Nietzsche" preset
uses temperature 1.2 and the UI slider allows up to 2.0, so every Claude request
under such a preset hard-broke. Clamp into [0.0, 1.0] in the Anthropic builder
only (OpenAI keeps its wider 0.0-2.0 range). Covers all three Anthropic call
paths, which build through this one function. None is passed through unchanged.

Fixes #1615

Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
2026-06-03 13:29:36 +09:00
Afonso Coutinho
96a874c604 fix: a non-dict finding silently drops all raw research findings (#1739) 2026-06-03 13:29:29 +09:00
Afonso Coutinho
063e7114e3 fix: youtube transcript formatter crashes on a non-dict segment (#1745) 2026-06-03 13:29:08 +09:00
Afonso Coutinho
133948cc78 fix: uploads with _ or - in the extension become permanently unreadable (#1756) 2026-06-03 13:28:45 +09:00
Afonso Coutinho
51857c9008 fix: chat memory extraction crashes on a non-dict message (#1749) 2026-06-03 13:25:48 +09:00
clockworksquirrel
2625e97f11 Stop conversations crashing during compaction on tool-call turns (#1777)
context_compactor.maybe_compact built its summary text with
msg.get('content', '')[:2000], which raised
TypeError: 'NoneType' object is not subscriptable on assistant turns
whose content is None (turns that carried only native tool_calls).
Once a conversation crossed the 85% compaction threshold — reached
after only a few turns on small-context local models plus the large
agent prompt — every subsequent message failed ("send more than three
messages and it stops working").

Flatten message content to text first via a _content_as_text helper
(str passthrough, multimodal list blocks joined, None -> "") and
tolerate a missing role. Adds tests/test_context_compactor.py covering
the helper and a >=4-message conversation that forces compaction with
a None-content tool-call turn (fails before this change, passes after).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:25:33 +09:00
Afonso Coutinho
2fa4d50115 fix: is_youtube_url crashes on a non-string url (#1752) 2026-06-03 13:24:33 +09:00
Wes Huber
3abb735200 fix(security): scope send_to_session agent tool by owner (#1757)
send_to_session was the only agent tool that didn't check session
ownership — an agent acting for user A could read from and write
into user B's session on a multi-user instance.

Add owner parameter and reject access when the target session
belongs to a different user, matching the pattern used by
create_session, list_sessions, and manage_session.

Fixes #1616

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 13:24:08 +09:00
Afonso Coutinho
b3da01efd5 fix: ui_control rejects the advertised rag toggle (#1763) 2026-06-03 13:24:00 +09:00
Lucas Daniel
68da800dcb fix(agent): stop sending tool schemas to native Ollama endpoints (#1765)
Models like gemma4, qwen3.5, and ministral served via Ollama's native
/api/chat respond to OpenAI-style tool schemas by emitting a single
native tool_call chunk and then stopping. The agent loop receives
1 token of round_response and no recognised ToolBlock, so the round
ends immediately — the user sees a one-token response.

Root cause: _is_api_model was True for any endpoint whose host appears
in _API_HOSTS (which includes "host.docker.internal" and "localhost")
OR whose model name matches a keyword like "gemma". Native Ollama
endpoints were never excluded from this path.

Fix: import _is_ollama_native_url from llm_core and treat native Ollama
endpoints (/api/chat, port 11434) as text-only by default — falling back
to the fenced-block tool path the local models are tuned for. The
per-endpoint supports_tools=True toggle (Settings → Endpoints) still
overrides this for users who have explicitly opted in.

Fixes #1567
2026-06-03 13:23:42 +09:00
Afonso Coutinho
d25a860f71 fix: document tidy crashes on a duplicate with NULL timestamps (#1772) 2026-06-03 13:23:01 +09:00
Afonso Coutinho
db1596f3b4 fix: signature learning never skips support@/info@/admin@ senders (#1773) 2026-06-03 13:22:52 +09:00
pewdiepie-archdaemon
ed7956cbd3 Owner-scope RAG doc ids so identical chunks across users don't collide (#1738, #1760)
_generate_doc_id hashed only text. add_document / add_documents_batch
early-return when the id exists, so the second owner indexing a
byte-identical chunk hit the first owner's id, was silently dropped,
and never stored under their owner — their owner-filtered search then
quietly omitted it. Hash owner + text; empty owner reproduces the
legacy id, so the unowned/base index keeps existing ids and isn't
re-churned. Same-owner identical chunks still dedupe.

Caught by #1738 and #1760 (independent reports of the same bug).
2026-06-03 11:36:31 +09:00
pewdiepie-archdaemon
9960d55a41 Decrypt CalDAV password before write-back (#1731)
writeback_event read cfg["password"] (the encrypted blob) and passed it
straight to DAVClient, so every local create/edit/delete authenticated
with the literal ciphertext, the remote rejected it, and the change
never reached the server — the exact silent-write-loss this module was
built to prevent. The pull path src/caldav_sync.py already decrypts;
mirror that. decrypt() is a no-op on legacy plaintext.

Caught by #1731.
2026-06-03 11:36:12 +09:00
pewdiepie-archdaemon
6153c5ed68 Close app_api blocklist gap for bare /api/tokens and /api/users
The blocklist prefixes had trailing slashes, so path.startswith() only
matched /api/tokens/{id} but not /api/tokens itself — the bare GET (list)
and POST (mint) endpoints were reachable via app_api. Same gap on
/api/users (list/create/delete). Drop trailing slashes so both bare and
sub-resource forms are blocked. /api/auth and /api/admin had no bare
endpoints today but get the same treatment to prevent future drift.

Caught by #1462.
2026-06-03 11:20:39 +09:00
Afonso Coutinho
aa5e3f6884 fix: is_markitdown_format crashes on a non-string path (#1618) 2026-06-03 09:00:10 +09:00
Afonso Coutinho
fc220f760f fix: inside_base_dir raises TypeError on a non-string path instead of failing closed (#1619) 2026-06-03 09:00:04 +09:00
Afonso Coutinho
2d94e38d23 fix: document_actions title/content helpers crash on non-string input (#1621) 2026-06-03 08:59:55 +09:00
Afonso Coutinho
03ddc5d2c4 fix: check_outbound_url crashes on a truthy non-string URL (#1623) 2026-06-03 08:59:49 +09:00
Afonso Coutinho
3175d7ca21 fix: tool-block parsing crashes on a non-string input (#1628) 2026-06-03 08:59:42 +09:00
Afonso Coutinho
d818117d4c fix: _extract_skill_json crashes on a truthy non-string teacher response (#1630) 2026-06-03 08:59:36 +09:00
Afonso Coutinho
8783f12c4c fix: builtin_actions heuristics crash on a truthy non-string input (#1639) 2026-06-03 08:59:16 +09:00
Afonso Coutinho
82c09dd768 fix: split_chunks emits a duplicate trailing chunk for text over size-overlap (#1573) 2026-06-03 08:57:54 +09:00