odysseus

MrSphay/odysseus

Fork 0

Commit Graph

Author	SHA1	Message	Date
red person	0ad5cd783b	Skip invalid research service sources (#1583 )	2026-06-03 08:57:09 +09:00
David Anderson	610968f91e	fix: data integrity — deep-research result parsing + memory-extraction durability (#808 ) Two independent data-integrity bugs: - services/research/service.py: ResearchService.research() (the public deep-research API, re-exported from services/__init__) treated the handler return value as a dict (result.get("sources"/"summary"/...)), but call_research_service() returns a formatted markdown STRING -> AttributeError: str has no attribute get on EVERY successful call, making the API unusable for any non-error result. Now uses the string report as the summary and parses sources from the "### Sources" markdown section (section-bounded, URL-deduped), with a defensive dict branch for back-compat. - services/memory/memory_extractor.py: extract_and_store guarded the vector-store find_similar/add calls only with the .healthy flag set ONCE at init. If the embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint down), those calls raised, the exception escaped the dedup loop, skipped memory_manager.save(), and was swallowed by the outer try/except -> EVERY validated fact from the session was silently lost (the function docstring promises "never raised"). Now falls back to the existing text/fuzzy dedup so facts are still saved when the vector index is unavailable at runtime. Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.	2026-06-02 11:27:31 +09:00

Author

SHA1

Message

Date

red person

0ad5cd783b

Skip invalid research service sources (#1583 )

2026-06-03 08:57:09 +09:00

David Anderson

610968f91e

fix: data integrity — deep-research result parsing + memory-extraction durability (#808 )

Two independent data-integrity bugs:

- services/research/service.py: ResearchService.research() (the public deep-research
  API, re-exported from services/__init__) treated the handler return value as a
  dict (result.get("sources"/"summary"/...)), but call_research_service() returns a
  formatted markdown STRING -> AttributeError: str has no attribute get on EVERY
  successful call, making the API unusable for any non-error result. Now uses the
  string report as the summary and parses sources from the "### Sources" markdown
  section (section-bounded, URL-deduped), with a defensive dict branch for back-compat.

- services/memory/memory_extractor.py: extract_and_store guarded the vector-store
  find_similar/add calls only with the .healthy flag set ONCE at init. If the
  embedding/ChromaDB backend degraded LATER (OOM, evicted model, remote endpoint
  down), those calls raised, the exception escaped the dedup loop, skipped
  memory_manager.save(), and was swallowed by the outer try/except -> EVERY
  validated fact from the session was silently lost (the function docstring
  promises "never raised"). Now falls back to the existing text/fuzzy dedup so
  facts are still saved when the vector index is unavailable at runtime.

Tests: test_research_service.py, test_memory_extractor_vector_degraded.py.

2026-06-02 11:27:31 +09:00

2 Commits