POST /api/embeddings/endpoint takes a user-supplied URL and immediately
makes an outbound httpx request to it with no validation. The admin gate
added earlier (PR #80) closed the unauthenticated-access part of #132; this
addresses the remaining request: validate the URL before fetching it.
Odysseus is local-first, so pointing the embedding endpoint at a loopback or
LAN server (local vLLM / llama.cpp / Ollama) is a normal setup — a blanket
private-IP block would break the primary use case. So the guard:
- always rejects non-HTTP(S) schemes (file://, gopher://, ftp:// …),
- always rejects the link-local range (169.254.0.0/16, incl. the cloud
instance-metadata 169.254.169.254 exfil vector) plus multicast /
reserved / unspecified, and IPv4-mapped-IPv6 forms of the above,
- keeps loopback/LAN allowed by default, and
- adds EMBEDDING_BLOCK_PRIVATE_IPS=true for full SSRF lockdown on exposed
multi-tenant deployments.
Logic lives in src/url_safety.py (stdlib only, resolver injectable) so it is
unit-testable without real DNS; the route calls it before the health-check
request. Covered by tests/test_url_safety.py (8 cases).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/api/health is a liveness ping. This adds /api/ready as a readiness /
integrity self-check that returns 503 unless every critical subsystem is
whole, so an orchestrator (Docker/Compose/k8s) can gate traffic on real
readiness rather than mere process liveness:
- database: opens a connection and runs SELECT 1
- data_dir: confirms the data directory exists and is writable
- local_first: reports whether storage stays on the host (informational;
a remote database is a valid deployment, so it never fails readiness)
The check logic lives in src/readiness.py so it is unit-testable in
isolation; the route is a thin wrapper. Covered by tests/test_readiness.py.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix: exclude deepseek from local tool-calling keyword list
deepseek-r1 on Ollama returns HTTP 400 when tool schemas are sent.
The cloud API (api.deepseek.com) is already caught by the _API_HOSTS
check, so the generic 'deepseek' keyword match was only causing false
positives for local Ollama-served models.
* fix: add model no-tools blocklist and regression tests for deepseek-r1
The previous fix removed 'deepseek' from the keyword allow-list, but
_is_api_model is still True for localhost endpoints because 'localhost'
appears in _API_HOSTS — so the keyword change had no effect for Ollama.
Proper fix: add an explicit _model_no_tools blocklist ('deepseek-r1')
that overrides the endpoint URL check. The endpoint's supports_tools DB
flag still takes priority either way (True forces tools on, False forces
them off), so users can override per-endpoint when needed.
Also refined the deepseek allow-list: 'deepseek-v' and 'deepseek-chat'
cover the cloud models (v2, v3, chat) that do support tools, without
matching deepseek-r1 variants.
13 regression tests cover:
- deepseek-r1 on localhost/docker: no tools (was HTTP 400)
- deepseek-v3/chat on api.deepseek.com: tools enabled (no regression)
- endpoint_supports=True/False overrides both lists
- qwen/llama on localhost: unaffected
Rework read_file / write_file confinement after review feedback:
- Remove $HOME from default allow roots. Only project data/ and system
temp dirs are allowed out of the box.
- Add a sensitive-subpath deny list (.ssh, .gnupg, shell rc files,
.env, .netrc, SSH key filenames). Checked BEFORE allowlist so it
blocks even when a broader root is configured.
- Add "tool_path_extra_roots" setting for opt-in broader access.
- Sensitive subpaths remain blocked regardless of configured roots.
Tests: 24 cases covering /etc/shadow, ~/.ssh/authorized_keys,
symlink into .ssh, traversal, shell rc files, key filenames,
extra roots, and dispatch-level end-to-end.
Scan port 1234 and any custom port from LM_STUDIO_URL, add the LM_STUDIO_URL host to the discovery sweep alongside the Ollama env vars, and tag each discovered endpoint with its provider by fingerprinting the native /api/v1/models response (entries carrying key + architecture). Documents LM_STUDIO_URL in .env.example.
Follow-up to the Venice provider PR. Wire api.venice.ai into the three
host allowlists so Venice behaves like the other paid OpenAI-compatible
clouds:
- agent_loop: add api.venice.ai to _API_HOSTS so the agent sends native
OpenAI tool-call schemas (Venice supports function calling) instead of
degrading to fenced-block parsing.
- teacher_escalation: add api.venice.ai to _SOTA_HOSTS so the escalation
loop stays OFF for Venice (it's a paid top-tier API; no need to add
teacher-model latency).
- webhook_routes: add venice to KNOWN_PROVIDERS so the sync chat webhook
can auto-resolve base_url from provider=venice.
Tests: tests/test_venice_hosts.py pins tool-host matching + SOTA
classification for Venice; py_compile on touched modules.
Co-authored-by: Cursor <cursoragent@cursor.com>
Read a model's capabilities.vision flag from LM Studio's native /api/v1/models so vision finetunes whose names lack a vision keyword still receive images, falling back to the name heuristic when the endpoint doesn't report it. The probe is short-TTL cached and restricted to local/LAN hosts, so remote/cloud endpoints are never contacted.
add_document() and add_documents_batch() derive the persistent ChromaDB
document id from Python's built-in hash():
doc_id = f"doc_{hash(text) % 10**16}"
str hashing is randomized per process (PYTHONHASHSEED is on by default), so
the same document text gets a different doc_id on every restart. The dedup
check right after — self._collection.get(ids=[doc_id]) — therefore misses
on restart, and identical documents are re-embedded and re-added as
duplicates each time the app restarts, bloating the vector store and
skewing retrieval.
Derive the id from a stable hashlib.sha256 of the text via a shared
_generate_doc_id() helper, used by both add paths so they agree.
tests/test_rag_vector_id_stability.py runs _generate_doc_id in subprocesses
under PYTHONHASHSEED=0/1/random and asserts the id is identical across all
of them (and differs for different text). Fails before this change.
* fix: only extract quotes whose closing quote matches the opening one
* fix: same mismatched-quote bug in the services search copy
* test: extract_quotes requires matching open/close quotes
* fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5)
These models only accept the default temperature; sending any explicit
value (even 0.0) returns HTTP 400 "Only the default (1) value is
supported". This broke two paths:
- Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so
a perfectly valid o3/gpt-5 endpoint is reported as failing in the
Model Endpoints health check.
- Chat/stream payloads send temperature unconditionally, so a non-default
temperature preset 400s on these models.
The code already special-cases the same model family for
max_completion_tokens, so this adds a sibling _restricts_temperature()
helper and omits the field for those models, letting the API use its
required default. gpt-4.5 is intentionally excluded (not a reasoning
model; accepts temperature normally).
Adds tests/test_llm_core_temperature.py covering the predicate and the
synchronous payload builder.
* fix: also omit temperature for reasoning models on the direct-POST paths
The first commit only covered llm_call/llm_call_async/stream_llm and the
endpoint probe. Email auto-summary, urgency-less spam classification, the
email reply-summary endpoint, and gallery vision tagging build their
OpenAI payloads inline and POST them directly (requests/httpx), bypassing
llm_core — so a reasoning model configured there would still 400 on the
temperature field. These sites already branch on _uses_max_completion_tokens,
so they're the same class; added the matching _restricts_temperature guard.
gallery_routes also gains the max_completion_tokens branch it was missing,
so gpt-5 vision tagging works end to end.
Note: email_pollers urgency scoring goes through llm_call_async and was
already covered.
Surfaces the research_run_timeout_seconds setting (added in #783) in
Settings → Research as a "Max Time" field, and lets 0 disable the
wall-clock cap entirely for long deep-research runs.
- settings.py: document that 0 disables the cap; default stays 1800s.
- research_handler.py: resolve 0 (or negative) to no timeout
(asyncio.wait_for timeout=None); other values stay bounded to
[60, 86400] as before.
- index.html / settings.js: "Max Time" input bound to
research_run_timeout_seconds, validated to {0} ∪ [60, 86400], with
copy making explicit that 0 = no limit (unbounded model/API cost).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Chat metrics: show backend's true generation t/s, not tokens÷wall-clock
The per-message tokens/sec read low and felt wrong because it was computed as
output_tokens / total_duration, where total_duration is wall-clock including
prefill, tool calls, and network — not pure decode time. llama.cpp already
reports the correct gen speed in its stream (timings.predicted_per_second), but
it was being dropped.
- llm_core.py: when parsing the OpenAI-compatible usage chunk, also read the
sibling `timings` block llama.cpp includes — pass predicted_per_second through
as gen_tps and prompt_per_second as prefill_tps on the usage event.
- agent_loop.py: capture backend_gen_tps/backend_prefill_tps from usage events;
in _compute_final_metrics prefer backend_gen_tps over the wall-clock division
when present (fall back to computed for cloud APIs that omit timings). Tag the
result with tps_source ("backend" vs "computed") and surface prefill_tps.
Result: the displayed t/s now matches the model's real decode speed and is
stable regardless of prompt length (a long prefill no longer deflates it).
Checks: py_compile passes; verified extraction against a real llama.cpp final
chunk (gen 79 t/s surfaced vs the deflated wall-clock figure shown before).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Chat metrics: surface true t/s on the direct-chat path too
Follow-up to the gen-tps work: the non-agent direct-chat stream path in
chat_routes turned the raw `usage` event straight into a metrics event but only
copied token counts — it never set tokens_per_second or response_time. So simple
(non-tool) replies showed "Speed: n/a" / "Time: undefineds" and the chip fell
back to a bare token count ("27 tok") instead of t/s.
Map the usage event's gen_tps (llama.cpp timings.predicted_per_second, added in
the prior commit) into tokens_per_second here too, tag tps_source=backend, and
set response_time from wall-clock for the stats popup.
Checks: py_compile passes; verified llama.cpp emits usage+timings on the final
stream chunk (gen ~90 t/s) that this path consumes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Tests: backend gen/prefill t/s passthrough and preference
Cover the two pieces of the true-t/s metric so it can be reviewed on its own:
- stream_llm surfaces llama.cpp's timings.predicted_per_second /
prompt_per_second as gen_tps / prefill_tps on the usage event (captured
llama.cpp final-chunk fixture), and omits them when the backend reports no
timings.
- _compute_final_metrics prefers backend_gen_tps over output/wall-clock,
tags tps_source ("backend" vs "computed"), and surfaces prefill_tps.
Reuses the fake-client stream harness from test_llm_core_streaming.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(upload): atomic-rename writes for uploads.json + .bak recovery
UploadHandler.save_upload does a read-modify-write of uploads.json via
two open(..., 'w') + json.dump blocks, with no lock, no temp+rename, and
no recovery. N concurrent inserts lost N-1 entries (last writer wins
after the read snapshot is taken); a SIGKILL/SIGTERM mid-json.dump
truncated the file and the bare 'except Exception: logger.warning(...)'
recovery path returned {}, silently dropping every prior upload.
The handler now serialises the RMW under a per-instance threading.Lock
and writes through _atomic_write_json, which writes to a tempfile in
the same directory, fsyncs, snapshots the previous live to .bak, and
renames the temp onto the target via os.replace. os.replace is atomic
on POSIX, so a reader sees either the old or the new state, never a
half-written file. _load_upload_index tries the live file first, then
falls back to the .bak sibling if the live is corrupt.
Cross-process safety is still on the deployer: gunicorn workers on
the same uploads dir will race the lock, and the atomic-rename is the
kernel-level guarantee that prevents torn reads. If multi-worker
writes are expected, fcntl.flock around the rename is a follow-up;
single-worker and async deployments are correct as-is.
* fix(upload): reload uploads.json inside _index_lock on dedupe path
The duplicate-detection branch in save_upload() was reading uploads.json
*before* taking _index_lock, then writing that stale snapshot under the
lock. A duplicate upload racing with a new-entry insert could clobber
the new entry because the duplicate's snapshot predated the insert.
The new-entry branch already reloaded inside the lock; the duplicate
branch now does the same. It also re-resolves the storage key inside
the lock, because a concurrent insert can have changed the dict's keys.
If the entry has been cleaned up between the outer read and the inner
write, the function falls through to the fresh-insert path instead of
silently writing a stale row.
Boundary note: the _index_lock serialises writers within a single
Python process. Cross-process / multi-worker deployments still need
flock or a database; the inline comment is updated to make this
explicit. The atomic-rename write keeps the on-disk state consistent
but does not serialise writers across processes.
Tests:
- Existing concurrent-insert and partial-write-recovery tests still pass.
- New test_atomic_write_primitives_present_in_production_code asserts
the production module has at least two 'with self._index_lock:' blocks
(regression net for this fix).
- New smoke tests: normal upload, duplicate detection, info lookup
after a backup-recovery scenario.
The 'add' action runs due_date through parse_due_for_user (natural
language like 'tomorrow at 9am', plus user-tz anchoring for naive ISO),
but 'update' stored the raw value verbatim. A reminder edited with
natural language was saved as an unparseable literal the frontend's
new Date() can't read, so it never fired. Route update's due_date
through the same parser as add.
analyze_topics() iterates session_manager.sessions and reads
session_data.get("history", []) directly. But SessionManager.load_sessions
seeds sessions metadata-only with empty history — messages are loaded
lazily, only when get_session(session_id) is called. So analyze_topics saw
empty history for every session that hadn't been individually opened this
process lifetime and reported total_topics: 0, even when the database held
plenty of matching messages.
Hydrate each candidate session via session_manager.get_session(session_id)
(the existing lazy-load path) before reading its history, after the
owner/archived filters so skipped sessions aren't loaded. Falls back to the
raw cached history when the manager has no get_session (test stubs).
tests/test_topic_analyzer.py: new test_topic_analyzer_hydrates_sessions
seeds a real SQLite DB with a session + message, runs the real
SessionManager (asserting cached history starts empty), then asserts
analyze_topics finds the topic. Fails before this change. The existing
keyword tests now pass an explicit owner to satisfy the owner-required
early return.
asyncio.wait_for wrapped create_subprocess_exec, which returns as soon
as the child is spawned, so the timeout never bounded the actual work.
yt-dlp could hang indefinitely on proc.communicate() and the
except asyncio.TimeoutError branch was unreachable. Bind the wait to
communicate() and kill/reap the child if it overruns.
After a non-native tool round, the agent appends tool results as a {role:
'user'} message next to the user's original 'user' prompt, producing two
consecutive 'user' messages. Strict provider APIs (Anthropic/Claude) reject
consecutive same-role messages, so the follow-up generation request fails
silently — search returns sources, then nothing is generated.
_sanitize_llm_messages now merges consecutive 'user' messages (joining their
content). Only user/user is merged; normal chat and agent/tool turns already
alternate and are untouched.
Scoped down per maintainer review: the agent_loop 'output' source-extraction
change is already on main (#898/#901) and the broad-mocking web-sources test
was dropped. Added a focused test that runs consecutive-user messages through
the real _build_anthropic_payload and asserts the payload alternates correctly.
`strip_think` removes a dangling (unclosed) `<think>` block via
`_THINK_OPEN_RE`, but that pattern was anchored to the start of the string
(`^\s*<think>`). An unclosed `<think>` (or `<thinking>`) opener that
appears *after* any leading output was therefore only half-handled: the
stray tag itself was removed by `_THINK_TAG_RE`, but the reasoning content
following it leaked straight to the user.
strip_think("Hello! <think> I am thinking.") # -> "Hello! I am thinking." (leak)
strip_think("Sure.\n<think>\nLet me reconsider...") # -> leaks the reasoning
`strip_think` feeds user-facing output across research, email replies,
notes, and scheduled tasks, so this leaks chain-of-thought to end users.
Un-anchor `_THINK_OPEN_RE` so a dangling opener anywhere strips from the
opener to end of string, consistent with the existing start-of-string
behavior. Content before the opener, closed `<think>...</think>` blocks,
and tag-free text are all preserved.
tests/test_strip_think.py covers the mid-text leak (fails before this
change), start-anchored unclosed, closed blocks, no-tag passthrough,
content-before-opener, and mixed closed+unclosed. Full existing think
suite still passes.
add_directory cleared exclusions with a raw path.startswith(directory)
test, which also matched sibling directories sharing a name prefix —
adding /docs would silently un-exclude files under /docs2. Match the
directory itself or paths under it (directory + os.sep) instead.
_process_pdf prepends "\n\n[PDF content]:" to extracted text, and two
call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:").
str.lstrip(chars) treats its argument as a *set of characters*, so it keeps
eating into the page text that follows the marker — e.g. a body starting
with "to the board" loses its leading "to" because 't'/'o' are in the
marker's character set. Replace both sites with a shared
strip_pdf_content_marker() helper that uses str.removeprefix.
Deep Research surfaced 'Error: unknown error' whenever every search
provider returned an empty result set without raising (e.g. SearXNG is
reachable but all its engines fail internally). _last_search_error was
only set on exceptions, so the empty-but-no-exception path left it unset
and the caller fell back to 'unknown error'.
Record an actionable reason on that path naming the providers that were
tried, so users can tell it's a search-backend problem rather than a
model problem. The provider-raised path is unchanged.
Re: #344.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
KNOWN_CONTEXT_WINDOWS lists 'o1' (200k) before 'o1-mini' (128k), and
_lookup_known returned on the first substring hit — so "o1-mini" matched
'o1' and reported 200000 instead of 128000. Track the longest matching
key instead, so the most specific entry wins regardless of table order.
`_ORIG_RE` (and its JS mirror `_TALON_ORIG_RE`) already recognised the
Japanese forward marker `転送` alongside the "Original Message" delimiters,
but not the English "Forwarded message" one. So Gmail-style forwards —
including the ones Odysseus itself emits (`---------- Forwarded message
----------`, static/js/emailInbox.js) — were not treated as a quote
boundary:
- with a following Outlook From:/Date: header block, the divider line
leaked into the level-0 reply bubble as noise;
- with only the divider marking the forward (no header block), the body
was not split into turns at all.
Add `Forwarded\s+message` to the same `[-_=]{3,}`-delimited alternation in
both the server-side parser and the JS mirror, so forward dividers are
consumed as an attribution boundary like "----- Original Message -----".
Locale variants of "Forwarded message" can follow the existing pattern.
Tests cover both manifestations plus a negative control (the bare words
"forwarded message" without `[-_=]{3,}` delimiters must not split).
Checks: python -m pytest tests/test_forwarded_message_divider.py (3 passed),
python -m py_compile src/email_thread_parser.py, node --check
static/js/emailLibrary/utils.js, git diff --check.
`get_tools_for_query` force-includes whole tool families when the query
mentions an intent keyword, but matched with a raw substring test
(`kw in ql`). Short hints therefore fired inside unrelated words, bloating
the tool set with irrelevant tools:
- "fix" matched "prefix" -> document tools
- "line" matched "deadline"/"online" -> document tools
- "serve" matched "observe"/"reserve" -> cookbook serve tools
- "reply" matched "replying" -> all email tools
- "unread" matched "unreadable" -> all email tools
Match each keyword on word boundaries instead
(`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to
the keyword matcher in topic_analyzer.py. Genuine intent keywords
("reply to this email", "edit the document", "serve the model") still match.
This only removes substring-inside-a-word matches; it does not change whole
-word matches (so e.g. an unrelated whole word like "tell" is a separate
keyword-choice question, left untouched here).
Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed;
3 of them fail on the pre-fix substring code), python -m py_compile
src/tool_index.py, git diff --check.
PresetManager.load already heals a forward-incompatible presets.json: the
block just above repairs the legacy `custom` shape and re-saves the file.
But if the file exists and is missing a whole built-in preset (e.g. an older
install written before `reason` existed), load returned it as-is, so that
built-in stayed permanently absent — silently missing from the picker that
GET /api/presets feeds, with no way for the user to get it back.
Extend the same self-heal: after the legacy migration, fill in any built-in
presets the loaded file is missing, defaults-first so user edits win, and
persist the result. This never clobbers an intentional removal — there is no
delete path for the built-in keys (only user_templates entries can be
deleted), and presets are hidden via an `enabled: False` flag, not removal.
Checks: python -m pytest tests/test_preset_fill_missing_defaults.py (3 passed;
2 fail on the pre-fix code), the existing preset cases in
tests/test_review_regressions.py still pass, python -m py_compile
src/preset_manager.py, git diff --check.
APIKeyManager.load() decrypts every stored key with a dict comprehension
and no error handling. If the .key file no longer matches the ciphertext in
api_keys.json — key rotated, a partial/!mismatched data restore, or a
corrupted .key — Fernet.decrypt raises cryptography.fernet.InvalidToken.
app_initializer.py calls api_key_manager.load() during startup, so a single
undecryptable entry takes down the whole app at boot, and the user can't
reach the UI to fix it.
Decrypt each key in a loop and, on InvalidToken/ValueError, log a warning
and skip that one entry while still returning every key that decrypts
cleanly. One bad/stale key no longer blocks startup.
tests/test_api_key_manager_resilience.py saves a valid key, then injects an
entry encrypted under a different Fernet key (InvalidToken) and a malformed
token (ValueError), and asserts load() returns the good key and skips the
bad ones without raising. Fails before this change.
The webhook URL guard's _ip_is_private() only checks a hardcoded
_PRIVATE_NETWORKS list, which misses several addresses that route
internally. validate_webhook_url() therefore ALLOWED:
- http://[::]/ (IPv6 unspecified, reaches localhost)
- http://[::ffff:127.0.0.1]/ (IPv4-mapped IPv6 loopback = 127.0.0.1)
- http://[::ffff:169.254.169.254]/ (IPv4-mapped cloud metadata endpoint)
The last one is the dangerous case: a webhook pointed at the mapped
169.254.169.254 can pull cloud instance credentials (SSRF -> credential
theft).
Harden _ip_is_private(): first unwrap IPv4-mapped IPv6 to its embedded IPv4
(addr.ipv4_mapped), then reject via the stdlib address properties
(is_private, is_loopback, is_link_local, is_reserved, is_multicast,
is_unspecified) in addition to the existing network list. Public addresses
still pass.
tests/test_webhook_ssrf_resilience.py asserts validate_webhook_url raises
for the three IPv6 bypasses plus 127.0.0.1 and 0.0.0.0, and still accepts a
public IP literal. The IPv6 cases fail before this change.
_build_ollama_payload sends options.temperature and options.num_predict
to /api/chat, but never options.num_ctx. Ollama defaults num_ctx to 2048
when the option is omitted, so prompts going to any Ollama backend are
silently truncated there regardless of the model's actual capability.
Thread the discovered context length through the three call sites
(llm_call, llm_call_async, stream_llm) and emit options.num_ctx when it
is known and positive. The builder filters out the DEFAULT_CONTEXT
fallback (128000) so we don't lie to Ollama about models whose window
we couldn't actually discover. The issue's literal 'when > 2048'
heuristic is dropped: a model with a real context smaller than 2048
would OOM if Ollama used its default, so we pass the real value
regardless of size. Matches how src/context_compactor.py uses the
same helper.
Sister fix to PR #753 — that PR teaches the compactor the right budget,
this one tells Ollama to actually use that budget on the way in.
tool_execution.py returns web search results as {"output": ..., "exit_code": 0}.
The sources-extraction block in stream_agent_loop only checked result.get("results")
and result.get("stdout"), so _src_text was always "" for every tool-call-mode web
search. Two consequences:
1. The SOURCES marker was never parsed and the web_sources SSE event was never
emitted -- the sources panel never appeared after agent-mode searches.
2. The marker (a large JSON blob) was left in result["output"] and forwarded
verbatim to the LLM in round 2 via format_tool_result, confusing some local
models into producing no tokens.
Fix: prepend result.get("output") to the lookup chain, and update the cleanup
assignment so result["output"] is overwritten with the stripped text.
Adds six regression tests in tests/test_agent_loop.py documenting the before/after
behaviour and verifying backward compat with the legacy results/stdout paths.
Co-authored-by: MohammadYusif <MohammadYusif@users.noreply.github.com>
Models (notably Gemini) emit a native 'google_search' function call, but the
agent loop had no mapping for it, so the call failed to convert, the round
produced 0 chars and 0 tool blocks, and generation died silently — the web
client hung on 'waiting for first token' with no error (also #443).
- Map google_search / google_search_retrieval / google_search_grounding to the
web_search tool, and read Gemini's 'queries' array (falling back to 'query').
- In stream_agent_loop, when a round yields no response text and no tool
events, emit a visible fallback message instead of leaving the user hanging.
- Give the unknown-tool execution branch an explicit exit_code=1 so the failure
is logged as an error rather than 'n/a'.
Unknown/unconvertible tool names still return None (unchanged) so they are
dropped safely rather than executed. Added tests covering the google_search
mapping, the queries array, and unknown/invalid-JSON returning None.
_resolve_ddg_redirect (the DuckDuckGo /l/?uddg= redirect resolver used on every
HTML-fallback result href) gated on `"duckduckgo.com" in parsed.hostname`. That
substring test also matches look-alike hosts like `duckduckgo.com.evil.com` and
`notduckduckgo.com`, so a result link on such a host would be silently rewritten
to its embedded `uddg` target. Same substring-vs-hostname pitfall fixed for
provider detection in 54ecfa3.
Match the host properly: exactly `duckduckgo.com` or a `.duckduckgo.com`
subdomain. Genuine redirects (`//duckduckgo.com/l/...`, and relative `/l/...`
hrefs resolved against `html.duckduckgo.com`) keep working.
The resolver was a closure inside duckduckgo_search; lifted it (plus the new
_is_duckduckgo_host helper) to module scope so it can be unit-tested directly.
Adds tests/test_ddg_redirect_resolution.py (red on the look-alike case before
this change, green after).
* fix(stream): read 'reasoning' SSE field for vLLM 0.20.2 / NIM
vLLM 0.20.2 / NVIDIA NIM emit reasoning-parser output in the `reasoning` delta field; older builds use `reasoning_content`. stream_llm() read only the latter, so reasoning from models like Nemotron-3-Nano (--reasoning-parser) was silently dropped and never rendered. Accept either field.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent): keep reasoning_content only on the latest assistant turn
The agent loop echoed each round's reasoning back as `reasoning_content` on every assistant turn, assuming vendors ignore it. Nemotron's chat template re-injects ALL prior reasoning_content as <think> blocks, and the loop is trimmed only once (before it starts) — so reasoning accumulated unbounded across rounds, bloating context and feeding the model its own prior reasoning, which reinforced repetition/looping. Strip reasoning_content from earlier assistant turns so only the most recent round carries it (still satisfies DeepSeek's thinking-mode follow-up requirement).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* fix(agent-ui): wrap each round's reasoning in its own <think> block
The streamed think-tag wrapper gated on whole-message substring checks (accumulated.includes('<think>')), which only ever wrapped ONE reasoning block per message. A multi-round agent response has a reasoning phase per round, so once round 1 closed its <think>...</think>, rounds 2+ reasoning was emitted unwrapped and leaked into the visible answer. Replace the substring checks with a stateful open/close flag that toggles per think/answer cycle, so each round's reasoning gets its own collapsible block. Single-turn chat is unchanged (one open, one close).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* test(stream): reasoning/reasoning_content delta surfaces as thinking chunk
Covers @pewdiepie-archdaemon's requested regression: a streamed {reasoning: ...} delta emits a thinking chunk while {content: ...} streams as normal content; plus the older reasoning_content field for backward compat. Mirrors the #591 scenario.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The fallback memory extractor (used by routes/memory_routes.py when the LLM
extractor fails) matched list items with `r'^[-*•]|\d+\.\s*(.*)'`. Operator
precedence makes that `(^[-*•]) | (\d+\.\s*(.*))`, so the capture group only
exists on the numbered-list branch.
A bullet line ("- foo") matches the first branch, so `group(1)` is None and
`text_match.group(1).strip()` raises AttributeError — crashing extraction for
any assistant message that contains a bullet list (i.e. most of them). Numbered
lists happened to work.
Group both markers — `r'^(?:[-*•]|\d+\.)\s*(.*)'` — so the capture applies to
bullets and numbers alike.
Adds tests/test_memory_bullet_extraction.py (red before, green after).
read_file/write_file passed the raw path to open(), so a tilde path like
~/notes.txt failed ("not found") — the shell's ~ expansion never happened
because there's no shell. Agents then fell back to bash to reach home-dir
files. Expand ~ (and ~user) with os.path.expanduser before opening.
Checks: python -m py_compile src/tool_execution.py.
TaskScheduler.start() aborts stale TaskRun rows but never advanced
ScheduledTask.next_run. Across a restart the in-process _executing set
is empty, so the first post-restart _check_due_tasks() call dispatches
every task whose next_run is still in the past — and so does every
subsequent poll, until the task's regular _execute_task path finally
runs compute_next_run and pushes it forward.
start() now queries active tasks with next_run < now and pushes each
one to now + 60s. The first poll after restart sees them as not-yet-due,
the task runs once normally, and compute_next_run puts the schedule
back on its real cadence. Paused and not-yet-due tasks are left alone.
The validator test was rewritten as a regression test asserting the
opposite of the bug it originally demonstrated, plus two narrower cases
to lock down the filter (only active+overdue is touched).
* fix: match topic keywords on word boundaries, not substrings
* fix: apply word-boundary matching to topic example snippets too
* test: topic keywords match whole words, not substrings