POST /api/research/start (require_privilege "can_use_research" — a normal
user, not admin) resolves an endpoint two ways and feeds the row's *decrypted*
api_key + base_url into research_handler.start_research(llm_endpoint=,
llm_headers=):
1. body.endpoint_id -> query(ModelEndpoint).filter(id == endpoint_id,
is_enabled == True).first()
2. no endpoint + nothing configured -> query(ModelEndpoint).filter(
is_enabled == True).first()
Neither was owner-scoped. ModelEndpoint is a per-user resource (core/database.py:
non-null owner = private, "the model picker only shows the endpoint to that
user"). So a research-privileged user (or a chat-scoped token) could pass another
user's PRIVATE endpoint_id — or fall through to their first-enabled row — and run
research against that owner's endpoint: spending their API key / quota and
reaching whatever internal base_url they configured (SSRF).
This is the same multi-tenant owner-scoping class already fixed for
companion/models, the /api/v1/chat session gate (#870), and the /api/v1/chat
first-enabled fallback (#1045, _first_enabled_endpoint). These two sinks on the
research path were missed.
Extract `_owned_enabled_endpoint(db, owner, endpoint_id=None)` which scopes via
the shared owner_filter helper (own rows + legacy null-owner shared rows),
matching webhook_routes._first_enabled_endpoint and session_routes._owned_endpoint.
Used for both sinks. A scoped miss on the explicit-id path returns the existing
404 ("Endpoint not found or disabled"), so endpoint existence isn't revealed. A
null/empty owner stays a no-op (single-user / legacy mode).
Add regression tests pinning both lookups (cross-owner rejected, own-row
allowed, legacy shared-row allowed, disabled-skipped, fallback never borrows,
null-owner no-op).
When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still
returned 401 (api/models, api/research/*, api/email/*), so the front-end
redirected the browser to /login and the app was unusable despite auth being
turned off. require_user() in src/auth_helpers.py already documents and honors
this contract (issue #622) via 'if _auth_disabled(): return ""', but these
endpoints did their own get_current_user/is_configured check without it.
Make _require_user (research), the /api/models anti-leak guard, and
email_helpers._require_auth consult _auth_disabled() and let anonymous through
(owner='') only when the operator explicitly disabled auth. The 401 protection
is fully intact when AUTH_ENABLED=true. Verified end-to-end: with
AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.
The spinoff endpoint authenticated the caller (_require_user) but never
verified the research session belonged to them before reading the
persisted report and seeding it into a new chat session owned by the
caller. Any authenticated user who knew or guessed another user's
research session ID could exfiltrate that user's full report into their
own session — a cross-user data disclosure (IDOR).
Every other endpoint in this router gates on _owns_in_memory /
_assert_owns_research right after validating the session ID; spinoff was
the lone exception. Add the same _owns_in_memory check (covers both the
in-memory task and the on-disk JSON) so a non-owner gets a 404 before any
data is read or a session is created.
Add regression tests pinning the anonymous (401) and wrong-owner (404)
cases.
Every research endpoint interpolates session_id into filesystem paths
(Path('data/deep_research') / f'{session_id}.json') without checking
for traversal sequences. A crafted ID like '../../data/auth' reaches
arbitrary JSON files — readable via research_detail (which also leaks
file paths in error messages), writable via research_archive, and
deletable via research_delete.
Add _validate_session_id() which rejects anything outside
[a-zA-Z0-9-]{1,128}. Called before filesystem access in all 12
endpoints that accept a session_id path parameter.