Adding GigaChat (Sber) or an on-premise enterprise LLM gateway as a
model endpoint fails on first probe with
CERTIFICATE_VERIFY_FAILED: self-signed certificate in certificate
chain (_ssl.c:1000)
because their TLS chain is signed by a private root CA (Russian Trusted
Root CA for GigaChat; corporate CA for on-prem) that isn't part of the
default system / certifi trust store. The endpoint shows offline in
the picker even though the URL and API key are correct (issue #722).
The right fix is to extend the trust store, not to weaken verification.
This change:
- src/tls_overrides.py: new module that resolves an opt-in env var
LLM_CA_BUNDLE at import time, builds a shared SSLContext via
ssl.create_default_context() (so the system / certifi bundle is
loaded first) and layers the operator's PEM on top with
load_verify_locations(). Exposes llm_verify() returning a value
suitable for httpx `verify=`. Defaults to True (httpx built-in
trust) when the env var is unset, when the file is missing, or
when the PEM fails to load — verification is never silently
disabled, the warning is logged and we fall back to the safe path.
- src/llm_core.py: thread llm_verify() into the shared AsyncClient
used by stream_llm / streaming completions.
- routes/model_routes.py: thread llm_verify() into the five httpx.get
call sites in _probe_endpoint / _ping_endpoint so adding a
private-CA endpoint goes green on the very first probe and the
picker stops showing it offline.
- .env.example: document LLM_CA_BUNDLE with the GigaChat case as the
concrete example.
Deliberately NOT included: a verify=False knob (global or per-host).
Disabling verification exposes the affected endpoint to MITM, and the
operator-supplied bundle is the correct fix for legitimate private-CA
providers — so the only switch in this PR is the safe one.
Closes#722.
get_builtin_overrides() was swallowing all exceptions with a bare
`except Exception: pass`, so misconfigured tool-description overrides
would silently produce wrong agent behaviour with no log trace.
The background endpoint refresh loop had the same pattern: any probe
failure was silently ignored, giving operators no signal that the
refresh was broken.
Also removes a circular self-import (`from src.agent_loop import
_build_base_prompt`) inside _build_system_prompt; the function is
already in scope and the import created a latent circular reference risk.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: support large proxy model endpoint refresh
Large OpenAI-compatible proxy endpoints can expose hundreds of models and make /v1/models slow. Treating those endpoints like local model servers caused model picker opens and background probes to repeatedly hit /models, producing timeouts and making otherwise usable endpoints appear offline.
Make model endpoint discovery cached-first for normal UI usage, add explicit proxy/API classification and refresh policy fields, exclude proxy/API endpoints from aggressive local probing, and preserve cached models when refresh fails.
Manual Test/Add/Refresh actions still fetch the full model list with longer timeouts so users can intentionally import large proxy model lists without blocking normal model picker usage.
* fix: preserve endpoint ping status semantics
* fix: Cookbook local GGUF serving inside Docker
Cookbook’s in-container GGUF serve flow had multiple Docker-specific breakages that made local llama.cpp models fail or register against the wrong endpoint.
Fixes included here:
use the scanned model cache root when generating GGUF serve commands instead of hardcoding $HOME/.cache/huggingface/hub
fix malformed llama.cpp preflight build lines that generated invalid bash in serve runner scripts
preserve loopback model URLs inside Docker when the target port is already reachable from the Odysseus container, instead of rewriting them unconditionally to host.docker.internal
Before this change, Docker local serves could fail in several ways:
Cookbook pointed llama.cpp at the wrong GGUF path
generated serve runner scripts crashed before launch with a shell syntax error
successfully started in-container model servers were auto-registered as host.docker.internal: instead of localhost/127.0.0.1
This makes the Docker Cookbook path work as expected for: downloaded GGUF -> local llama.cpp serve -> endpoint registration
* test: add test for docker-local endpoint rewrites
* fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5)
These models only accept the default temperature; sending any explicit
value (even 0.0) returns HTTP 400 "Only the default (1) value is
supported". This broke two paths:
- Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so
a perfectly valid o3/gpt-5 endpoint is reported as failing in the
Model Endpoints health check.
- Chat/stream payloads send temperature unconditionally, so a non-default
temperature preset 400s on these models.
The code already special-cases the same model family for
max_completion_tokens, so this adds a sibling _restricts_temperature()
helper and omits the field for those models, letting the API use its
required default. gpt-4.5 is intentionally excluded (not a reasoning
model; accepts temperature normally).
Adds tests/test_llm_core_temperature.py covering the predicate and the
synchronous payload builder.
* fix: also omit temperature for reasoning models on the direct-POST paths
The first commit only covered llm_call/llm_call_async/stream_llm and the
endpoint probe. Email auto-summary, urgency-less spam classification, the
email reply-summary endpoint, and gallery vision tagging build their
OpenAI payloads inline and POST them directly (requests/httpx), bypassing
llm_core — so a reasoning model configured there would still 400 on the
temperature field. These sites already branch on _uses_max_completion_tokens,
so they're the same class; added the matching _restricts_temperature guard.
gallery_routes also gains the max_completion_tokens branch it was missing,
so gpt-5 vision tagging works end to end.
Note: email_pollers urgency scoring goes through llm_call_async and was
already covered.
Both get_default_chat and _recover_empty_session_model picked the
first model from cached_models[0] without checking hidden_models.
If the first cached model was hidden (e.g. minimax-m3), it was
returned as the default or used to repair empty session models,
even though the model list endpoints already filter hidden_models.
- Add _visible_models() helper that filters cached_models by
hidden_models (mirrors the filtering in list_model_endpoints)
- Use _visible_models() in get_default_chat fallback (when no
explicit default_model is saved)
- Use _visible_models() in _recover_empty_session_model (when
repairing a session whose model field is empty before chat send)
- Add regression tests for hidden-model filtering in default chat
resolution, and unit tests for _visible_models helper
In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio
default http://localhost:1234/v1) targets the Odysseus container itself, not
the host running the server, so the probe gets a connection error and the
endpoint is rejected with a misleading 'No models found for that provider/key'.
Rewrite loopback to host.docker.internal (which compose already maps to
host-gateway) for the probe and the saved URL, mirroring the existing Ollama
handling. Gated on actually being in a container with the gateway reachable, so
native installs and gateway-less deploys are untouched.
Fixes#25
Co-authored-by: Claude <noreply@anthropic.com>
_ping_endpoint() is the reachability fallback the model-endpoint POST
handler invokes when _probe_endpoint() returns no model ids. It GETs
base + "/models" and, on any sub-500 response, returns immediately with
`reachable = (status < 400)`. That early return runs before the
Ollama-native /api/version / /api/tags fallback below it.
For an Ollama URL without /v1 (the quickstart accepts both
http://localhost:11434 and http://127.0.0.1:11434, and the reporter
on #1025 explicitly tried both), the OpenAI-style probe target is
http://127.0.0.1:11434/models. Ollama returns 404 there because /models
only lives under /v1. _ping_endpoint then returned reachable=False and
the picker showed "Added (offline — will retry on next load)" on an
install that was running fine. /api/version was never tried.
Same shape for http://127.0.0.1:11434/api (the native Ollama root):
/api/models is also 404, same premature offline verdict.
_probe_endpoint() does fall through to /api/tags on a 4xx (the response
raises via raise_for_status), so the endpoint quietly recovers once
cached_models becomes non-empty on the next background refresh —
matching the second commenter's "had to disconnect manually then
reconnect for it to be detected" note. The bug is most visible while
no models are pulled yet (cached_models stays empty, _ping_endpoint
keeps voting offline).
Fix:
- Hoist the Ollama-shaped-URL test (port == 11434 or "ollama" in
hostname — the same condition _probe_endpoint already uses) to the
top of the function so both code paths share it.
- Stop short-circuiting on 4xx when the URL looks like Ollama: fall
through to the existing /api/version + /api/tags reachability loop
so an alive Ollama gets recognised even when its OpenAI surface has
the wrong prefix for the user's input.
- Fix the `root` computation in that loop to strip a trailing /api as
well as /v1, so http://127.0.0.1:11434/api no longer gets probed at
/api/api/version.
- 4xx on non-Ollama hosts keeps the current semantics: a 401 from
api.openai.com/v1/models is still a definitive offline verdict, not
a reason to GET /api/version on OpenAI.
Closes#1025.
When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still
returned 401 (api/models, api/research/*, api/email/*), so the front-end
redirected the browser to /login and the app was unusable despite auth being
turned off. require_user() in src/auth_helpers.py already documents and honors
this contract (issue #622) via 'if _auth_disabled(): return ""', but these
endpoints did their own get_current_user/is_configured check without it.
Make _require_user (research), the /api/models anti-leak guard, and
email_helpers._require_auth consult _auth_disabled() and let anonymous through
(owner='') only when the operator explicitly disabled auth. The 401 protection
is fully intact when AUTH_ENABLED=true. Verified end-to-end: with
AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.
The "don't wipe endpoint_url/model on endpoint delete" half of #587 landed
in 6a78b02 (Fix endpoint model preservation for tasks). The three remaining
follow-up pieces from the original PR — flagged in the review on #786 —
are:
- routes/model_routes.py: toggle_model_endpoint (PATCH) now accepts
api_key and base_url, so the admin UI can rotate a key or fix a typo'd
URL without going through delete+recreate. base_url is normalized the
same way the POST handler does (strip /models, /chat/completions,
/completions, /v1/messages, then _normalize_base). Cache invalidation
matches the POST/DELETE paths and the response includes base_url so the
frontend can confirm what was saved.
- routes/chat_routes.py: new _recover_empty_session_model picks
cached_models[0] from the endpoint that matches sess.endpoint_url and
persists it onto the Session row before the LLM call goes out. Wired
into both /api/chat and /api/chat_stream after the existing
_clear_orphaned_session_endpoint guard, so the order is: drop
truly-orphaned sessions first, then heal the "picker showed it, session
never knew" case.
- routes/chat_routes.py: when recovery fails (no endpoint, no cached
models) raise HTTP 400 with a clear message instead of letting
model="" reach the upstream as 401/503.
Closes#587.
* Dedupe URL routing helpers and tighten adjacent hostname checks
* Match providers by hostname, not substring, in _detect_provider
_detect_provider used `"anthropic.com" in url`-style substring checks, so a URL
that merely contained a provider's domain in its path or query — or a look-alike
host like `anthropic.com.example` — was misclassified and picked the wrong
auth-header/payload shape. Switch it to the existing `_host_match` helper
(hostname exact/subdomain match), the same way the human-readable labels and
curated model lists already work, finishing that migration. Also harden
`_host_match` against trailing-dot FQDNs.
Not a credential-leak fix: _detect_provider only classifies a URL the admin
already configured next to its key, and the URL — not this function — decides
where the request goes. This is a correctness/consistency cleanup.
Adds tests that import the real helpers (test_endpoint_resolver.py tests local
copies, so it can't catch this) covering the substring false-positives.
Refs #768.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Import build_headers under its real name in model_routes
It was imported as `build_headers as _provider_headers`, which collides with
the unrelated llm_core._provider_headers(provider, headers) — same name,
different signature. Use the real name to remove the confusion.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Use hostname matching in URL builders, not raw suffix checks
PR review flagged that _detect_provider() was hardened to match on
hostname, but several helpers still used raw host.endswith("anthropic.com")
/ host.endswith("ollama.com"), which match adjacent hosts like
notanthropic.com / notollama.com.
Route the remaining checks through _host_match(): _is_ollama_native_url
and _ollama_api_root in llm_core, and _anthropic_api_root / _ollama_api_root
in endpoint_resolver. With _detect_provider already hostname-correct, the
trailing "or host.endswith(...)" clauses in build_chat_url / build_models_url
are redundant, so drop them rather than fix the substring match in place.
Add builder-level tests asserting look-alike and domain-in-path hosts route
to the OpenAI-compatible default. They import the real builders and fail on
the pre-fix code.
Co-Authored-By: Claude <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Background tasks (e.g. the Email Tags / check_email_urgency action)
resolve their model through resolve_endpoint("utility") → Default Chat.
When the configured model is one the user has since disabled on the
endpoint, the resolver still dispatched to it — on Groq that surfaces as
every email failing with "HTTP 400: model ... requires terms acceptance".
Two paths fed this:
- The auto-pick fallback selected from cached_models without excluding
the endpoint's hidden_models, so a disabled model listed first won.
- A stale default_model left pointing at a now-disabled model (seeded at
endpoint registration from raw model_ids[0]) was used verbatim.
Fix resolve_endpoint / resolve_endpoint_by_id to drop a configured model
that's in hidden_models and to pick the first ENABLED chat model. Also
seed default_model on registration via _first_chat_model so we never pin
the global default to an embedding/tts entry a provider lists first.
Checks: python -m pytest tests/test_endpoint_resolver.py
tests/test_model_routes.py tests/test_model_context.py (all pass);
python -m py_compile app.py routes/model_routes.py
src/endpoint_resolver.py.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Require admin access before serving provider discovery data from
GET /api/providers. This prevents normal authenticated users from
triggering provider discovery or receiving cached provider host data.
Keep GET /api/models available to normal users and leave the existing
admin-only GET /api/discover behavior unchanged.
Add a focused regression test to ensure unauthorized callers cannot
trigger discovery and cannot receive cached provider data.
POST /api/model-endpoints always inserted a new row, so Settings -> Add
Models -> Scan for Servers re-added any endpoint a user had already
registered manually — once under its model name (from the earlier
manual add) and again under its host:port (auto-generated when scan
posts without a name). The success toast then misreported the result
as "added N new".
Look up an existing endpoint with the same base_url accessible to the
caller (shared or owned by them) before inserting. If found, return it
with `existing: true` so the client can tell the difference between
an actual add and a dedupe hit. Toast now reads, e.g.,
"Found 1 server with 1 model — 1 already added".
Tested: POSTing the same base_url three times (incl. trailing-slash
variation) returns the same id each time; only one row exists.