fix(history): scope topic analysis to authenticated owner only (#744)
Two changes close the cross-tenant topic leak in /api/conversations/topics. The route at routes/history_routes.py:478 used get_current_user, which returns None when no auth middleware has set request.state.current_user (loopback-bypass, AUTH_ENABLED=false, or any path that short-circuits the middleware). It then forwarded owner=None to analyze_topics. The helper at src/topic_analyzer.py:21 used an 'if owner:' short-circuit in its owner filter, so the None owner took the no-filter path and the helper silently aggregated topic frequencies and per-snippet session_id, session_name, role, and snippet text across every user's sessions. analyze_topics now returns an empty result when owner is falsy. The inner short-circuit is removed because the filter is now strict by construction. The route is switched to require_user, which raises 401 when auth_manager.is_configured is True and the caller is anonymous, matching the pattern used by calendar_routes, skills_routes, and other authenticated routes. The test test_history_topics_owner_scope.py was rewritten to drive the real route through FastAPI's TestClient with a stub AuthMiddleware that mirrors the loopback-bypass branch, and now asserts a strict 401 from the route and an empty result from the helper. The previous version of the test accepted either a 200-with-empty-topics or a 401; the strict assertion means a future regression that drops the require_user wrapper or re-adds the inner short-circuit is caught immediately.
This commit is contained in:
@@ -23,20 +23,31 @@ def analyze_topics(session_manager, owner: str = None) -> Dict[str, Any]:
|
||||
Scan non-archived sessions and return topic frequency data.
|
||||
If owner is set, only include sessions belonging to that user.
|
||||
|
||||
When `owner` is None or empty the helper returns an empty result. The
|
||||
unauthenticated-loopback path in `app.py` produces a None owner, and
|
||||
silently aggregating topic frequencies in that case is a cross-tenant
|
||||
data leak. Callers that want a system-wide aggregate must pass an
|
||||
explicit `owner` string (e.g. a documented "admin" pseudo-owner) or
|
||||
the route must reject the request with 401.
|
||||
|
||||
Returns dict with "topics" list and "total_topics" count.
|
||||
"""
|
||||
if not owner:
|
||||
return {"topics": [], "total_topics": 0}
|
||||
|
||||
topic_counts: Dict[str, int] = {t: 0 for t in TOPIC_KEYWORDS}
|
||||
topic_matches: Dict[str, list] = {t: [] for t in TOPIC_KEYWORDS}
|
||||
|
||||
for session_id, session_data in session_manager.sessions.items():
|
||||
if session_data.get("archived", False):
|
||||
continue
|
||||
# SECURITY: strict ownership — the previous predicate let any
|
||||
# null-owner session feed into another user's topic analysis.
|
||||
if owner:
|
||||
sess_owner = session_data.get("owner") or getattr(session_data, "owner", None)
|
||||
if sess_owner != owner:
|
||||
continue
|
||||
# Strict ownership: any session whose owner does not match the
|
||||
# caller is excluded. Ownerless sessions are never included
|
||||
# unless the caller is itself ownerless (which the early return
|
||||
# above already prevents).
|
||||
sess_owner = session_data.get("owner") or getattr(session_data, "owner", None)
|
||||
if sess_owner != owner:
|
||||
continue
|
||||
|
||||
for msg in session_data.get("history", []):
|
||||
content_raw = msg.get("content") if isinstance(msg, dict) else getattr(msg, "content", None)
|
||||
|
||||
Reference in New Issue
Block a user