Commit Graph

350 Commits

Author SHA1 Message Date
Afonso Coutinho
1571d8bba0 fix: agent_tools._truncate crashes on non-string input (#1624)
* fix: agent_tools._truncate crashes on non-string input

* fix: agent_tools._truncate returns a string for non-string input, not the raw value
2026-06-03 14:06:39 +09:00
Afonso Coutinho
3a741edbf1 fix: visual_report markdown helpers crash on a non-string input (#1633) 2026-06-03 14:06:35 +09:00
red person
8af1f85665 Ignore non-string email thread bodies (#1654) 2026-06-03 14:06:31 +09:00
Afonso Coutinho
a54d34149a Parse standard Gmail quote attribution dates
Allow Gmail quote attribution parsing to handle standard US weekday/month/day/year comma patterns while preserving existing formats, with JS regression coverage.
2026-06-03 13:45:56 +09:00
Afonso Coutinho
46999debdb Decode email headers without injected spaces
Use email.header.make_header for MIME header decoding so adjacent encoded/plain header parts preserve RFC spacing, with regression coverage.
2026-06-03 13:45:33 +09:00
Afonso Coutinho
f29c827e6e Merge search analytics defaults in services copy
Make services.search.analytics tolerate missing counters in older or partial analytics files by merging loaded data over defaults, with regression coverage.
2026-06-03 13:45:07 +09:00
Afonso Coutinho
10e797a1aa Normalize scheduled email offsets before storage
Normalize scheduled email send_at values with timezone offsets or Z suffixes to naive UTC before storing, matching the poller's lexicographic comparison format and preventing early/late sends.
2026-06-03 13:44:18 +09:00
Afonso Coutinho
28dbd5346c Treat non-string research summaries as low quality
Filter malformed non-string research summaries instead of letting the broad exception path classify them as usable, with regression coverage.
2026-06-03 13:42:24 +09:00
Afonso Coutinho
a880b17624 Skip malformed personal keyword index rows
Make personal keyword retrieval tolerate corrupted non-dict index entries and missing chunk lists, with regression coverage.
2026-06-03 13:42:05 +09:00
Mubashir R
61d62a3cb8 Fix memory bullet extraction in service copy
Fix services.memory bullet-list extraction by grouping the bullet/number regex before the capture, and cover both memory manager copies in the regression test.
2026-06-03 13:41:46 +09:00
Marius Popa
4ec53a296a Fix document editor scrollbar and line-number sync
Fixes #1501
Fixes #1496
2026-06-03 13:40:19 +09:00
Afonso Coutinho
13f0171ce8 fix: extract_youtube_id crashes on a non-string url instead of returning None (#1689) 2026-06-03 13:38:11 +09:00
Afonso Coutinho
35b9509da3 fix: memory entry validation crashes on a non-dict row from memory.json (#1691) 2026-06-03 13:38:02 +09:00
Afonso Coutinho
f0b172020e fix: require_privilege 500s on a non-dict privileges blob from auth.json (#1693) 2026-06-03 13:37:54 +09:00
Rolly Calma
933c461f38 fix: use running loop for shell stream deadlines (#1694) 2026-06-03 13:37:46 +09:00
Afonso Coutinho
02ff2e3cb0 fix: updating a calendar event ignores user timezone and shifts the time (#1695) 2026-06-03 13:37:39 +09:00
Afonso Coutinho
667b739af4 fix: reply-all Cc builder crashes on a non-string To or Cc field (#1700) 2026-06-03 13:37:22 +09:00
Afonso Coutinho
19e62208d2 fix: streaming drops providers that emit SSE data lines with no space (#1701) 2026-06-03 13:37:14 +09:00
Afonso Coutinho
3da4edb442 fix: token usage dropped when it rides on a non-empty finish delta (#1703) 2026-06-03 13:36:57 +09:00
Afonso Coutinho
9dd9bb8a3f fix: memory recall crashes on a non-dict row from the vector store (#1705) 2026-06-03 13:35:09 +09:00
Afonso Coutinho
86d3af743a fix: docs RAG query crashes on a non-dict row from the index (#1706) 2026-06-03 13:35:01 +09:00
Afonso Coutinho
076607c9b9 fix: archive browser model filter is suffix-only and drops matching models (#1709) 2026-06-03 13:34:54 +09:00
Afonso Coutinho
56123e052b fix: compacting a chat with image attachments destroys the attachment (#1710) 2026-06-03 13:34:47 +09:00
Afonso Coutinho
f6f86c4b34 fix: research source extraction crashes on a non-dict finding (#1714) 2026-06-03 13:34:40 +09:00
Afonso Coutinho
29e19f326a fix: _resolve_user_upload_path crashes on a non-dict resolve_upload result (#1715) 2026-06-03 13:34:33 +09:00
Afonso Coutinho
55c7a4a546 fix: computeSnap throws when ctx.otherLayers is not an array (#1716) 2026-06-03 13:34:25 +09:00
Mubashir R
319ba50a44 fix: validate client-supplied image _endpoint to prevent SSRF (gallery proxies) (#1718)
POST /api/image/harmonize and POST /api/image/inpaint read an `_endpoint` from
the request body and issue server-side httpx POSTs to it with no validation. A
caller can set `_endpoint` to http://169.254.169.254/ (cloud instance metadata)
or any internal/loopback address the server can reach, turning these routes into
an SSRF primitive.

routes/embedding_routes.py already runs its user-supplied endpoint through
src.url_safety.check_outbound_url; these two routes were missing the same guard.
Validate `_endpoint` the same way before any outbound request: non-HTTP(S)
schemes and the link-local metadata range are always rejected, and
IMAGE_BLOCK_PRIVATE_IPS=true blocks private/loopback for full lockdown (the
local-first default still allows LAN diffusion servers).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:34:17 +09:00
Mubashir R
535d05c142 fix: SearchService.search() calls comprehensive_web_search incorrectly (broken public API) (#1720)
SearchService.search() did:

    raw_results = await comprehensive_web_search(
        query, max_results=10 * depth, fetch_content=fetch_content)

comprehensive_web_search is a synchronous function whose count knob is
`max_pages` (not `max_results`) and which has no `fetch_content` parameter, so
the call raised TypeError on argument binding; `await` on its non-coroutine
return would also fail. It returns a context string, or a (context, sources)
tuple with return_sources=True — not the list of dicts the wrapper iterates.

The method is exported in services/search/__init__.py and services/__init__.py
with a usage example in its docstring, so any caller of the documented public
API hit an immediate crash. Call it correctly via asyncio.to_thread with
max_pages + return_sources=True and use the returned source list as the rows.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:33:56 +09:00
lekt8
126e91e8b9 Don't attempt the same (url, model) route twice in the fallback chains (#1733)
The fallback helpers (llm_call_with_fallback, llm_call_async_with_fallback,
stream_llm_with_fallback) build their candidate list as the primary target
followed by the configured fallbacks. Callers prepend the session's live
(url, model) to default_model_fallbacks, so if the user also lists their current
model among the fallbacks — a common misconfiguration — the chain re-attempts
the very route that just failed: a wasted round-trip (and, for the streaming
path, a spurious 'fallback' notice for a switch that didn't actually happen).

Add a small _dedupe_candidates() helper that filters malformed entries and drops
a later repeat of an already-seen (url, model), preserving order (first wins,
keeping its headers). Apply it in all three fallback chains.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:33:50 +09:00
lekt8
77614e9feb Don't force-include the email toolset on every "tell me" query (#1707) (#1735)
The agent tool-RAG force-includes a keyword hint's tools whenever any of its
keywords appears in the query (word-boundary match). The email-intent hint listed
"tell", which matches a huge fraction of requests — e.g. "visit <url> and tell
me the title" — so the whole email toolset was force-included and crowded out the
relevant tools. The model then saw a prompt dominated by email tools and reported
it had no web search / could not visit the URL.

Remove "tell" from the email keyword set. Genuine email intent still fires on
email/mail/gmail/inbox/unread/message/send/reply.

Test drives get_tools_for_query directly with retrieval stubbed (the keyword
hints are deterministic, no embeddings needed): a "...tell me..." web query no
longer pulls in email tools, a real email request still does.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:33:43 +09:00
Mubashir R
a8a5d6f56e fix: RAG keyword fallback leaked owner-less documents across users (#1722)
VectorRAG.search() filters with ChromaDB where={"owner": owner}, returning only
documents whose owner equals the requesting user. The keyword fallback
(_keyword_search_fallback, used when the primary query raises) guarded with
`if doc_owner and doc_owner != owner: continue`, so a document with a
missing/empty owner fell through and was returned to whichever user issued the
query — a cross-user information leak on the fallback path.

Match the primary path's strict filter: skip any doc whose owner != the
requested owner, including owner-less docs.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:31:33 +09:00
Afonso Coutinho
ada30aa039 fix: evaluate_turn_regex crashes on a non-string agent_reply (#1723) 2026-06-03 13:31:26 +09:00
Afonso Coutinho
290d398900 fix: rewriting a message is lost on reload due to a non-existent DB column (#1729) 2026-06-03 13:31:19 +09:00
Afonso Coutinho
d9e6071528 fix: odysseus-mail read crashes on an empty IMAP fetch payload (#1730) 2026-06-03 13:31:10 +09:00
Afonso Coutinho
c5bc39de88 fix: _extract_entities crashes on a non-string query (#1724) 2026-06-03 13:30:28 +09:00
Afonso Coutinho
0c37943267 fix: search service crashes on a non-dict result row (#1725) 2026-06-03 13:30:19 +09:00
Mubashir R
fefac05ab1 fix: history DB fallback returned hidden (compaction) messages to the client (#1726)
GET /api/history/{session_id} skips messages whose metadata has `hidden` (e.g.
compaction summaries kept for AI context, not shown to the user) on the
in-memory path. The DB fallback — used when the in-memory history is empty,
e.g. after a restart — built the response from every stored row with no such
filter, so hidden messages leaked to the client on DB-served sessions.

Filter `hidden` out of the response on the DB path too. The rebuilt in-memory
session.history still includes them, so AI context (the compaction summaries)
is preserved.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:30:11 +09:00
Mubashir R
4907b16d9b fix: personal-docs path confinement used abspath, allowing symlink escape (#1728)
_resolve_allowed_personal_dir confined a user-supplied path to PERSONAL_DIR with
os.path.abspath + os.path.commonpath. abspath normalises `..` but does NOT
resolve symlinks, so a symlink placed inside PERSONAL_DIR pointing outside it
passes the commonpath check and lets index_personal_documents read files outside
the root. Use os.path.realpath for both the base and the candidate so symlinks
are resolved before the confinement check.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 13:29:57 +09:00
Ethan
0e538ecd29 Fix RAG remove_directory wiping the entire shared collection (#1660) (#1734)
Removing one RAG directory destroyed the whole shared ChromaDB collection
(all owners + base index) instead of just that directory's chunks. Shared
root cause: PersonalDocsManager.remove_directory called rebuild_index()
(delete_collection + recreate) then re-indexed only the remaining tracked
dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory
that should have been used was itself broken (where={"source":{"$contains":dir}}
selects nothing on scalar metadata and would over-delete siblings), and the
dead do_manage_rag path fired a second unconditional rebuild.

- VectorRAG.remove_directory: select chunks in Python by a path-boundary match
  on the stored absolute `source` (dir or dir+os.sep), abspath-normalized.
  Keys on `source` (always written), never `owner` -- no migration.
- PersonalDocsManager.remove_directory: call the targeted remove instead of
  rebuild_index() + partial reindex.
- do_manage_rag (dead code): drop the second rebuild_index() (hygiene).
- rag_server.py add path: abspath so indexed `source` matches the remove.

No schema change. Prevents future wipes (does not recover already-wiped
vectors). Adds hermetic regression tests at three layers.

Fixes #1660

Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
2026-06-03 13:29:51 +09:00
Ethan
b9c382006e Clamp Anthropic temperature to [0.0, 1.0] in _build_anthropic_payload (#1737)
Anthropic's Messages API rejects temperature > 1.0 with HTTP 400, but
_build_anthropic_payload forwarded it verbatim. The shipped "Nietzsche" preset
uses temperature 1.2 and the UI slider allows up to 2.0, so every Claude request
under such a preset hard-broke. Clamp into [0.0, 1.0] in the Anthropic builder
only (OpenAI keeps its wider 0.0-2.0 range). Covers all three Anthropic call
paths, which build through this one function. None is passed through unchanged.

Fixes #1615

Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
2026-06-03 13:29:36 +09:00
Afonso Coutinho
96a874c604 fix: a non-dict finding silently drops all raw research findings (#1739) 2026-06-03 13:29:29 +09:00
Afonso Coutinho
7f94c43a45 fix: langIcon throws on an explicit null opts argument (#1740) 2026-06-03 13:29:21 +09:00
Afonso Coutinho
fc8efca49d fix: backup import drops a user's memory when its text matches another user's (#1743) 2026-06-03 13:29:14 +09:00
Afonso Coutinho
063e7114e3 fix: youtube transcript formatter crashes on a non-dict segment (#1745) 2026-06-03 13:29:08 +09:00
Afonso Coutinho
6e38d3f2ef fix: youtube (services) comment formatter crashes on a non-dict comment (#1746) 2026-06-03 13:29:01 +09:00
lekt8
9aa2445ec7 Reconnect after a failed SEARCH ALL so the email poller doesn't desync IMAP (#1613) (#1748)
On a large Gmail mailbox the email-summary poller's SINCE scan often finds
nothing (INTERNALDATE/date-header quirks), so it falls back to SEARCH ALL. That
returns one enormous UID line; the socket read can time out mid-response, and the
exception was swallowed — leaving the unread '* SEARCH 325188 …' bytes on the
socket. The next command (the downstream re-select) then read those leftover
bytes and failed with 'EXAMINE => unexpected response: b'325188 …''.

Extract the fallback into _latest_inbox_fallback_uids(conn, reconnect): on a
failed SEARCH ALL it logs out the poisoned connection and reconnects, returning
the fresh connection for downstream use. Reconnecting is correct by construction
— a new connection cannot carry the old one's leftover bytes — so the re-select
always runs on a clean socket.

The same SEARCH ALL + reuse pattern also exists in mcp_servers/email_server.py
and routes/email_routes.py; left for a separate change to keep this surgical.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:28:53 +09:00
Afonso Coutinho
133948cc78 fix: uploads with _ or - in the extension become permanently unreadable (#1756) 2026-06-03 13:28:45 +09:00
Afonso Coutinho
992866e167 fix: document library language facet undercounts text documents (#1758) 2026-06-03 13:28:38 +09:00
lekt8
a096e872f5 Let orphaned documents be reopened from the library (#1602) (#1761)
After an AI-written document is closed, its session_id is nulled (the detach
behaviour from #1238). Both Open controls in the Documents library — the card's
expanded Open button and the card dropdown's Open item — gated on
`doc.session_id`: they wired `libraryOpenInSession` (which early-returns with no
session) and DISABLED the control otherwise, so the user's own document showed a
grayed-out Open button and couldn't be reopened.

The module already has `libraryOpenDocument`, which explicitly handles the
orphaned case ("just open in editor without switching session" -> _loadDocument
by id). Route the no-session path there instead of disabling.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 13:28:31 +09:00
ghreprimand
6f001af2a3 Add a 'Rebuild llama.cpp' Cookbook action to force a fresh GPU build (#1787)
The serve bootstrap builds llama-server from source only when it is missing
from PATH, so a host that first compiled CPU-only (no nvcc present at build
time) reuses that CPU-only binary on every later serve and never gets a GPU
build, even after a CUDA/ROCm toolkit is installed. There was no UI lever to
force a rebuild.

Adds a 'Rebuild llama.cpp' button to the Cookbook Dependencies tab. It clears
the cached ~/bin/llama-server symlink and ~/llama.cpp/build directory (locally
or on the selected remote server) so the next serve recompiles and picks up
CUDA/HIP if a toolchain is now present. It installs and downloads nothing.

- routes/cookbook_helpers.py: _llama_cpp_rebuild_cmd() (single source of truth)
- routes/shell_routes.py: POST /api/cookbook/rebuild-engine (admin-only, reuses
  the existing SSH plumbing for remote hosts)
- static/js/cookbook.js: header button + handler honoring the deps server selector
- tests: cover the command shape and a clean run on a fresh HOME

Motivated by #831 (RTX 4070 user stuck on a CPU-only build with no way to
re-trigger the build).

Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-03 13:28:19 +09:00