Commit Graph

4 Commits

Author SHA1 Message Date
Tatlatat
7f97ab3032 Topics: hydrate session history before analysis
analyze_topics() iterates session_manager.sessions and reads
session_data.get("history", []) directly. But SessionManager.load_sessions
seeds sessions metadata-only with empty history — messages are loaded
lazily, only when get_session(session_id) is called. So analyze_topics saw
empty history for every session that hadn't been individually opened this
process lifetime and reported total_topics: 0, even when the database held
plenty of matching messages.

Hydrate each candidate session via session_manager.get_session(session_id)
(the existing lazy-load path) before reading its history, after the
owner/archived filters so skipped sessions aren't loaded. Falls back to the
raw cached history when the manager has no get_session (test stubs).

tests/test_topic_analyzer.py: new test_topic_analyzer_hydrates_sessions
seeds a real SQLite DB with a session + message, runs the real
SessionManager (asserting cached history starts empty), then asserts
analyze_topics finds the topic. Fails before this change. The existing
keyword tests now pass an explicit owner to satisfy the owner-required
early return.
2026-06-02 20:44:27 +09:00
Afonso Coutinho
48d3b7abab fix: topic analysis false-matches keywords as substrings (e.g. 'ai' in 'email') (#687)
* fix: match topic keywords on word boundaries, not substrings

* fix: apply word-boundary matching to topic example snippets too

* test: topic keywords match whole words, not substrings
2026-06-02 11:42:04 +09:00
Ernest Hysa
360bc83a66 fix(history): scope topic analysis to authenticated owner only (#744)
Two changes close the cross-tenant topic leak in /api/conversations/topics.

The route at routes/history_routes.py:478 used get_current_user, which
returns None when no auth middleware has set request.state.current_user
(loopback-bypass, AUTH_ENABLED=false, or any path that short-circuits the
middleware). It then forwarded owner=None to analyze_topics.

The helper at src/topic_analyzer.py:21 used an 'if owner:' short-circuit
in its owner filter, so the None owner took the no-filter path and the
helper silently aggregated topic frequencies and per-snippet session_id,
session_name, role, and snippet text across every user's sessions.

analyze_topics now returns an empty result when owner is falsy. The
inner short-circuit is removed because the filter is now strict by
construction. The route is switched to require_user, which raises 401
when auth_manager.is_configured is True and the caller is anonymous,
matching the pattern used by calendar_routes, skills_routes, and other
authenticated routes.

The test test_history_topics_owner_scope.py was rewritten to drive the
real route through FastAPI's TestClient with a stub AuthMiddleware that
mirrors the loopback-bypass branch, and now asserts a strict 401 from
the route and an empty result from the helper. The previous version of
the test accepted either a 200-with-empty-topics or a 401; the strict
assertion means a future regression that drops the require_user wrapper
or re-adds the inner short-circuit is caught immediately.
2026-06-02 11:36:01 +09:00
pewdiepie-archdaemon
e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00