* fix: renaming a user leaves their API tokens resolving to the old owner
* Drive rename token-cache test through the real auth resolver instead of patching a closure
* fix(llm): auto-detect <think> in content stream for unregistered thinking models
_THINKING_MODEL_PATTERNS only covers known model families by name. Qwen3-derived
models with non-standard names (e.g. Qwopus, custom QwQ forks) are not matched,
so their <think>...</think> content streams through as visible chat text instead
of being routed to the thinking display.
When the first content delta opens with <think> and the model was not already
identified as a thinking model, dynamically flag the stream as a thinking model
for the remainder of the response. This enables the existing </think> repair path
(line below) and ensures the frontend receives the full <think>...</think> wrapper
it needs to split thinking from the final answer.
The check is restricted to the very first content delta (_first_content_sent is
False) to avoid misidentifying models that happen to write "<think>" mid-answer.
Fixes#2225
Related: #2420 (covered by separate PR from @AmmarS-Analyst), #2224 (@RaresKeY)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(llm): replace inert _thinking_model flag with _in_think_tag state machine
The original auto-detect set _thinking_model=True on the first <think> chunk
but still emitted it as a regular delta and set _first_content_sent=True
immediately, so no subsequent chunk could enter the repair path.
Replace with _in_think_tag bool: enter thinking mode when first content starts
with <think>, route all chunks to the thinking channel until </think> is found,
then the tail becomes the first regular delta. Adds three regression tests.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(llm): replace _first_content_sent guard with _think_open_stripped
Opening-tag stripping used `not _first_content_sent` as the guard, but
_first_content_sent stays False throughout the entire think block (it only
flips when regular content is emitted). So `find(">")` ran on every
reasoning chunk — not just the first — and silently truncated everything
before the first ">" in any reasoning text containing comparisons, arrows,
or code.
Fix: add `_think_open_stripped = False` alongside `_in_think_tag`. Use it
as the strip guard in both the "still inside <think>" path and the
"</think> found in same chunk" split path. Set it True once the opening
tag is consumed so all subsequent chunks reach the thinking channel
unmolested.
Add regression test: 3-chunk stream where the middle chunk contains
"c > d" — confirms "more c " is not dropped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes module-level core.database stubbing from the compare endpoint owner-scope regression test and patches ModelEndpoint per test with monkeypatch. Restores one focused part of the Python CI baseline tracked in #2580.
Agent subprocesses (bash, python) previously inherited the container's default
working directory (/app), so files created with relative paths landed in the
ephemeral container layer and were silently destroyed on any docker compose up
--build or container recreation.
Set cwd=_AGENT_WORKDIR (resolved to <repo_root>/data at import time) and
HOME=_AGENT_WORKDIR on both subprocess launchers so that:
- pwd inside a bash tool returns the persistent data directory
- relative paths and ~ resolve to a location that survives rebuilds
- the agent can still cd to any absolute path it needs
The resolution uses pathlib.Path(__file__).parent.parent / "data", which
works for both Docker (/app/src → /app/data) and manual installs
(<repo>/src → <repo>/data) without requiring a new env var or compose change.
Fixes#2512
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Odysseus only supports llama.cpp on Windows (vLLM/SGLang are
explicitly blocked). llama.cpp requires GGUF, so AWQ/GPTQ/FP8
safetensors models without a GGUF alternate should not be
recommended in the Cookbook on Windows hosts.
Changes:
- hardware.py: add 'platform': 'windows' to _detect_windows()
so downstream logic can identify Windows hosts.
- fit.py: include is_windows in the existing GGUF-only filter
alongside apple_silicon and consumer_amd.
- tests: add test_hwfit_windows.py with regression tests.
Fixes#122, #614 (root cause: unservable models recommended).
Python's bool('false') returns True because the string is non-empty.
A JS client serialising a boolean as the string 'false' would have
supports_tools or is_enabled silently flipped to True — so 'disable
tool support' would actually enable it.
Use an explicit lookup dict for supports_tools and a case-insensitive
string check for is_enabled so both string and native bool inputs are
handled correctly.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Updates the stale gallery owner-filter null-user test to match current single-user/auth-disabled behavior. Restores one focused part of the Python CI baseline tracked in #2580.
A system message that arrives without a 'content' key — possible via
malformed tool results — raised a KeyError in the hot path of llm_call,
llm_call_async, and stream_llm. Replace m["content"] with
m.get("content") or "" in all three functions so a missing key degrades
to an empty string instead of crashing.
Also removes a redundant .rstrip() after .strip() in _model_activity_key.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The retry loop raised immediately for any non-success HTTP response
regardless of attempt count. For transient upstream errors (rate limit,
bad gateway, gateway timeout) the function should back off and retry
within the existing attempt budget.
Also lets ConnectError / ConnectTimeout retry when the host has not been
cooled and attempts remain, instead of always raising on the first
connect failure.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Updates the PDF marker regression test to check corrupted markers at line level instead of using a broad substring assertion. Restores one focused part of the Python CI baseline tracked in #2580.
Two bugs prevented pinned models from appearing in the chat model picker:
1. _fetch_models() only used _cached_model_ids(), ignoring pinned_models.
Since Fireworks AI doesn't list kimi-k2p6-turbo in /v1/models, the
cached list was empty, so the endpoint showed as offline with no models.
2. _curate_models() filtered unknown pinned IDs into models_extra, but the
chat UI only reads models (primary list). Pinned models stayed invisible.
Fix: use _visible_models() to merge cached + pinned, then promote pinned
IDs from models_extra to models so they appear in the dropdown.
Closes#1521 follow-up
Updates the split_chunks containment regression test to use deterministic non-repeating records instead of a repeating fixture that could produce accidental substring matches. Restores one focused part of the Python CI baseline tracked in #2580.
Updates endpoint/model-route test HTTP mocks to accept the verify keyword argument passed by endpoint probing code. Restores one focused part of the Python CI baseline tracked in #2580.
Gives the agent first-class code navigation instead of shelling out via bash
(token-heavy, unreliable on weaker models, unstructured). Mirrors the
Grep/Glob/Read primitives that Claude Code / opencode expose.
- grep: regex search over file contents across a tree. Uses ripgrep when
available (with explicit excludes so junk dirs are skipped even without a
.gitignore); falls back to a pure-Python walk+regex when rg is absent.
Returns file:line:match, capped.
- glob: find files by glob pattern (recursive), newest first.
- ls: list a directory (folders first, then files with sizes).
- read_file: optional offset/limit for line-range reads of large files
(plain-path calls stay back-compatible).
All confined by the same path policy as read_file (_resolve_tool_path:
data/tmp allowlist + sensitive-file deny). Junk dirs (.git, node_modules,
venv, __pycache__, dist/build, …) skipped. Output capped (200 hits,
400 chars/line). Admin-gated like the other filesystem tools.
Wiring: schemas + native arg->content serializer (src/tool_schemas.py), tool
tags (src/agent_tools.py), always-available + descriptions (src/tool_index.py),
admin gate (src/tool_security.py), dispatch + impls (src/tool_execution.py).
Tests: tests/test_code_nav_tools.py — match/skip-junk/ignore-case/glob-filter,
allowlist rejection, glob/ls, read-range, and the no-ripgrep Python fallback.
* Add edit_file tool + file-change diffs
edit_file is an exact old_string -> new_string replacement on a file on disk
(fails if old_string is missing or non-unique unless replace_all); write_file
also returns a unified diff. Diffs render collapsed in the tool bubble
(filename + +adds/-dels, theme colors); the raw JSON command box is hidden.
Security: edit_file is a sensitive filesystem-write tool, treated everywhere
write_file is —
- added to NON_ADMIN_BLOCKED_TOOLS (is_public_blocked_tool / blocked_tools_for_owner),
so on auth-enabled deployments a non-admin cannot run it; execute_tool_block
refuses it for non-admin owners.
- confined by the same path policy as read_file/write_file (allowlist +
sensitive-file deny) via _resolve_tool_path.
Disambiguation in tool descriptions + bash prompt: edit_file/write_file are the
only way to write files (they show a diff) — never edit_document (editor panel)
or a bash heredoc/redirect.
Tests (tests/test_edit_file.py): non-admin block (policy + execution gate),
successful edit, not-found old_string, non-unique old_string (+ replace_all),
and path outside the allowed roots.
Files: src/tool_execution.py, src/agent_loop.py, src/tool_schemas.py,
src/agent_tools.py, src/tool_index.py, static/js/chat.js, static/style.css,
tests/test_edit_file.py.
* Drop redundant import os in write_file closure
os is already imported at module top.
* Show the serving provider in the model-info card
The model-info popup (click the model name on a message) shows the model
and pricing, with a logo inferred from the model NAME. But the same model
can be served by different endpoints — e.g. claude-haiku via OpenRouter
vs GitHub Copilot vs Anthropic direct — which the name-based logo can't
distinguish.
Add a 'Provider' line derived from the session's endpoint URL:
- new providerLabel(endpointUrl) in static/js/providers.js maps the host
to a friendly name (GitHub Copilot, OpenRouter, Anthropic, OpenAI,
Google, AWS Bedrock, DeepSeek, Mistral, Groq, Together, Fireworks,
Perplexity, xAI), 'Local' for loopback/LAN, else the bare host.
- static/js/chatRenderer.js renders it under Model in the card, from
window.sessionModule.getCurrentEndpointUrl().
* Anchor provider-label patterns to the hostname
providerLabel matched its patterns against the full endpoint URL with
unanchored substrings, so a host like max.airlines.com matched /x\.ai/ and was
mislabeled "xAI". Anchor each pattern to the end of the hostname ((^|.)domain$)
and test against the parsed host instead of the raw URL.
* chore: dedupe src/search/cache.py into a re-export shim
src/search/cache.py was a byte-identical copy of services/search/cache.py.
Convert it to a sys.modules alias of the canonical services module (matching
src/search/core.py, providers.py, ranking.py) so the two cannot drift, and add
an identity assertion to test_search_module_consolidation.py.
content.py and query.py are intentionally left as-is: the copies have drifted
and services lacks fixes that src has, so they need services reconciled first
before they can be shimmed safely.
* chore: dedupe src/search content.py and query.py into shims
Convert src/search/content.py and query.py to sys.modules aliases of the
canonical services/search/* (matching cache.py, core.py, providers.py,
ranking.py) so the duplicate copies cannot drift.
Repoint the two tests that were coupled to the src-copy internals onto the
canonical services surface (behaviour is equivalent):
- test_src_search_query_nonstring.py: import services.search.query instead of
loading the src file by path.
- test_security_regressions.py::test_web_fetch_guard_blocks_redirect_into_private:
mock httpx.get (services uses the module-level get, not httpx.Client) and
assert on the canonical 'Blocked' message.
Drop the now-redundant [src_content, service_content] parametrization in
test_search_content_extraction_parity.py and test_search_content_url_guards.py
(after the shim both params are the same object); add content/query identity
assertions to test_search_module_consolidation.py.
* fix: live-resume chat stream on session re-entry (#2539)
When a session was re-entered after a page refresh or in a new tab while
its agent run was still streaming, the UI showed a frozen "Generating
response..." spinner, polled stream_status until the run finished, and
then did a full reload. The live tokens were never shown.
Add resumeStream() in chat.js: it consumes GET /api/chat/resume/{id}
(which replays the run's buffer then streams live), renders reply tokens
as they arrive, and reloads the session on completion for the canonical
final render. sessions.js _checkServerStream now calls it on re-entry and
falls back to the previous spinner+poll path if it is unavailable.
* Finalize plain-text resume in place instead of reloading
On stream completion, resumeStream() called selectSession(), forcing a full
history re-fetch and a visible flicker right as the stream finished.
For plain text replies (no tool calls, sources, doc streaming, or multi-round
output) the live tokens are already rendered, so finalize in place: replace the
live bubble with a canonical single message via chatRenderer.addMessage (markdown
+ footer actions + metrics, the same renderer history uses), captured from the
streamed metrics event. No history refetch, no extra round-trip, no flicker.
Rich responses still reload, since their canonical render (tool bubbles, sources,
multi-bubble) is rebuilt from the saved DB record.
* Use a dedicated set for the resume re-attach lock; fix stale docblock
resumeStream() marked its re-attach lock in _backgroundStreams, which
checkBackgroundStream() also reads. On a second re-entry of the same session
while a resume was still live, checkBackgroundStream() mistook that entry for a
same-tab POST stream and spawned its own spinner+poll bubble. Move the lock to a
dedicated _resumingStreams set (also covered by hasActiveStream) so the two paths
no longer collide. Also update the resumeStream docblock to describe the
in-place finalize vs reload split.
The multi-GPU GGUF filter at fit.py:380 returned None unconditionally
for Q*/IQ quants on 2+ GPU systems. When the caller explicitly passes
target_quant, they are asking 'what happens if I try this?' and expect
a structured no_fit response, not a silent None.
Fix: skip the filter when target_quant is explicitly provided so the
call falls through to the existing no_fit path.
Fixes #
The compatibility re-export shim at src/search/ranking.py forgot
_SPORTS_HINT_RE, so tests importing src.search.ranking raised
AttributeError on the [src] parametrize variant.
Fixes#1995
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
comprehensive_web_search now called with (query, max_pages, return_sources)
and returns a tuple (_context, results). The test mock still used the old
async signature with max_results/fetch_content and returned a plain list,
causing TypeError on every run.
Fixes#2331
* fix: SSE parser crashes with NoneType on MiniMax-M3 (and any provider sending null choice/usage/tc)
Three guards added in stream_llm:
1. choices[0] null check — MiniMax (and some other providers) send a
choices entry as None. `_choices[0].get("delta")` raised
AttributeError. Now checks `_choices[0] is not None` before calling
.get().
2. usage null guard — j["usage"] can arrive as None (not a dict) on
some providers. Added `or {}` so subsequent .get() calls don't crash.
3. tool_calls null entry skip — individual entries in the tool_calls
array can be None. Added `if tc is None: continue` before
tc.get("function").
All three match the `or {}` / null-guard pattern used elsewhere in the
same block. Safe for all OpenAI-compatible providers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: guard null choice in elif-choices SSE branch
The usage-chunk path already guarded _choices[0] is not None, but the
elif "choices" branch that processes content/tool-call deltas did not.
A chunk like {"choices": [null]} or {"choices": [null], "usage": null}
reaches j["choices"][0].get("delta") and crashes with:
'NoneType' object has no attribute 'get'
Fix: extract choices[0] into _c0 and continue to the next chunk when
it is None, matching the guard already applied in the usage path.
Adds three focused regressions covering the paths the maintainer flagged:
- {"choices": [null]}
- {"choices": [null], "usage": null}
- tool_calls array containing a null entry alongside a valid call
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The visual research report is assembled from LLM output over crawled web
pages (untrusted content) and served under a relaxed `script-src
'unsafe-inline'` CSP. Two values reached that HTML without sanitization:
- `_md_to_html` rendered the report markdown via python-markdown, which
passes raw HTML through verbatim, so `<script>` / `<img onerror>` /
`<svg onload>` / `javascript:` links carried in crawled content ran in
the app origin.
- `category` (from the /api/research/start request body, no enum check) was
interpolated raw into `<body class="category-{category}">`.
Allowlist-sanitize the rendered markdown with nh3, keeping the formatting
the report emits (tables, code, details/summary, toc anchors, codehilite
classes, external-link target/rel) while dropping active content, and
html.escape the category. Adds regression tests.
Adding GigaChat (Sber) or an on-premise enterprise LLM gateway as a
model endpoint fails on first probe with
CERTIFICATE_VERIFY_FAILED: self-signed certificate in certificate
chain (_ssl.c:1000)
because their TLS chain is signed by a private root CA (Russian Trusted
Root CA for GigaChat; corporate CA for on-prem) that isn't part of the
default system / certifi trust store. The endpoint shows offline in
the picker even though the URL and API key are correct (issue #722).
The right fix is to extend the trust store, not to weaken verification.
This change:
- src/tls_overrides.py: new module that resolves an opt-in env var
LLM_CA_BUNDLE at import time, builds a shared SSLContext via
ssl.create_default_context() (so the system / certifi bundle is
loaded first) and layers the operator's PEM on top with
load_verify_locations(). Exposes llm_verify() returning a value
suitable for httpx `verify=`. Defaults to True (httpx built-in
trust) when the env var is unset, when the file is missing, or
when the PEM fails to load — verification is never silently
disabled, the warning is logged and we fall back to the safe path.
- src/llm_core.py: thread llm_verify() into the shared AsyncClient
used by stream_llm / streaming completions.
- routes/model_routes.py: thread llm_verify() into the five httpx.get
call sites in _probe_endpoint / _ping_endpoint so adding a
private-CA endpoint goes green on the very first probe and the
picker stops showing it offline.
- .env.example: document LLM_CA_BUNDLE with the GigaChat case as the
concrete example.
Deliberately NOT included: a verify=False knob (global or per-host).
Disabling verification exposes the affected endpoint to MITM, and the
operator-supplied bundle is the correct fix for legitimate private-CA
providers — so the only switch in this PR is the safe one.
Closes#722.
Pip dependency installs are tracked as download tasks but finish with the
runner's "=== Process exited with code 0 ===" sentinel and pip's
"Successfully installed" line — never the HuggingFace download markers
(DONE / 100% / /snapshots/ / DOWNLOAD_OK) the download heuristics look for.
Once the tmux pane is gone, the backend's only completion check is the HF
cache lookup, which a pip package (e.g. llama-cpp-python[server], no "/")
never matches, so it reports "stopped" — and the frontend maps a stopped
download to "crashed". The reconnect loop's session-gone heuristic had the
same gap. Result: a clean install (exit 0) showed "crashed" in the Running
tab while the Dependencies tab correctly showed it installed.
Add a shared _depInstallSucceeded() helper that keys off the exit-0
sentinel (falling back to pip's success line, rejecting ERROR/Traceback)
and wire it into both the session-gone heuristic and the background status
reconciler, gated on payload._dep so real model downloads are unaffected.
Also fixes the pre-existing test_background_status_poll_reconciles_into_local_tasks
assertion that no longer matched the evolved reconciler, and adds regression
coverage for both paths.
txt/html/md export joined and string-munged message.content directly, so a
multimodal turn (content is a list of blocks) crashed export with a TypeError
on join (txt) / AttributeError on .replace (html), and None content (tool-only
assistant turns) rendered as the literal 'None'. Add a _content_to_text helper
that flattens string/list/None to plain text and apply it at the three export
sites. JSON export is unchanged (it serializes structured content correctly).
Plain-string content is returned unchanged, so existing exports are identical.
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
Switching to a two-branch workflow: contributors open PRs against `dev`,
and `main` is fast-forwarded to a tested `dev` commit at each release.
This separates "things land in staging" (can move fast) from "things
ship to users" (slow, tested in a browser by the maintainer first).
CONTRIBUTING: add a Branch model section explaining the split + how to
retarget a PR.
PR template: add an explicit "this PR targets dev" checkbox at the top
so it's the first thing a contributor confirms.
End-users cloning the repo will now land on `dev` by default; they can
`git checkout main` if they want the curated branch.
get_builtin_overrides() was swallowing all exceptions with a bare
`except Exception: pass`, so misconfigured tool-description overrides
would silently produce wrong agent behaviour with no log trace.
The background endpoint refresh loop had the same pattern: any probe
failure was silently ignored, giving operators no signal that the
refresh was broken.
Also removes a circular self-import (`from src.agent_loop import
_build_base_prompt`) inside _build_system_prompt; the function is
already in scope and the import created a latent circular reference risk.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
routes/document_routes.py imports UPLOAD_DIR from src.constants in 8
separate function bodies but never uses it (pyflakes: 'imported but
unused' ×8). Drop the dead imports — no behaviour change.
.github/workflows/ci.yml runs on push to main + PRs:
- python-syntax: compileall over app.py + core/routes/src/services/scripts/tests
- node-syntax: node --check on our JS (static/app.js + static/js)
- python-tests: pip install + pytest (continue-on-error for now)
Hardening: least-privilege `permissions: contents: read`, a `concurrency`
group that cancels superseded runs, and actions pinned to commit SHAs
(version in a comment) instead of mutable tags.
routes/calendar_routes.py imports several names it never uses (pyflakes):
typing.Tuple, dateutil.rrule.{rruleset,DAILY,WEEKLY,MONTHLY,YEARLY}, and
auth_helpers.get_current_user. Drop them (the whole DAILY/WEEKLY/MONTHLY/
YEARLY line goes; rrulestr and require_user are kept). No behaviour change.
ModelEndpoint is defined in core.database, not src.database. The wrong
import silently prevented the module from loading in deployment
configurations that do not have a src/database.py shim, resulting in an
ImportError at startup.
Also adds a warning log when resolve_endpoint finds no usable model
(all models hidden or the list is empty), making the otherwise-silent
failure visible in operator logs.
The test_auth_regressions stub for src.endpoint_resolver was missing the
build_models_url attribute, which caused test collection errors.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>