extract_and_store dedups each extracted fact against the vector store
before the (owner-scoped) text fallback. The vector store is a single
shared ChromaDB collection storing only {"source": "memory"} — no
owner — and find_similar queries it with no owner filter, so it can
return a memory_id belonging to a different tenant. The old code
continue'd (skipped storing) on any vector hit without checking
ownership, so when ChromaDB is healthy (the common path) a user's
freshly-extracted fact was silently dropped because it was merely
semantically similar to another user's memory — the text fallback that
IS owner-scoped never ran. Gate the skip on the matched memory being
this user's own (or legacy unowned), mirroring the text dedup predicate;
cross-tenant or stale matches fall through. Same bug class as #1743.
audit_memories saves final_entries merged with other owners' entries
(correct), but then rebuilt the shared vector collection from
final_entries alone — wiping every other owner from semantic search
until they happened to run their own audit. Keyword fallback masked
it, so it degraded silently. Capture saved_entries once and rebuild
from that.
Caught by #1747.
Two independent data-integrity bugs:
- services/research/service.py: ResearchService.research() (the public deep-research
API, re-exported from services/__init__) treated the handler return value as a
dict (result.get("sources"/"summary"/...)), but call_research_service() returns a
formatted markdown STRING -> AttributeError: str has no attribute get on EVERY
successful call, making the API unusable for any non-error result. Now uses the
string report as the summary and parses sources from the "### Sources" markdown
section (section-bounded, URL-deduped), with a defensive dict branch for back-compat.
- services/memory/memory_extractor.py: extract_and_store guarded the vector-store
find_similar/add calls only with the .healthy flag set ONCE at init. If the
embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint
down), those calls raised, the exception escaped the dedup loop, skipped
memory_manager.save(), and was swallowed by the outer try/except -> EVERY
validated fact from the session was silently lost (the function docstring
promises "never raised"). Now falls back to the existing text/fuzzy dedup so
facts are still saved when the vector index is unavailable at runtime.
Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.