odysseus/src at dc8a882f1f88e4936b81d182a9b5e8a48559947d - odysseus - Gitea: Git with a cup of tea

MrSphay/odysseus

Files

History

Tatlatat dc8a882f1f fix(rag): use a stable hash for document IDs so dedup survives restarts (#1098 )

add_document() and add_documents_batch() derive the persistent ChromaDB
document id from Python's built-in hash():

    doc_id = f"doc_{hash(text) % 10**16}"

str hashing is randomized per process (PYTHONHASHSEED is on by default), so
the same document text gets a different doc_id on every restart. The dedup
check right after — self._collection.get(ids=[doc_id]) — therefore misses
on restart, and identical documents are re-embedded and re-added as
duplicates each time the app restarts, bloating the vector store and
skewing retrieval.

Derive the id from a stable hashlib.sha256 of the text via a shared
_generate_doc_id() helper, used by both add paths so they agree.

tests/test_rag_vector_id_stability.py runs _generate_doc_id in subprocesses
under PYTHONHASHSEED=0/1/random and asserts the id is identical across all
of them (and differs for different text). Fails before this change.

2026-06-02 22:42:23 +09:00

..

fix: extract_statistics drops large numbers and trailing % signs (#1153 )

2026-06-02 22:35:30 +09:00

action_intents.py

Route calendar action requests to tools

2026-06-01 14:32:41 +09:00

agent_loop.py

Polish email and cookbook flows

2026-06-02 22:42:07 +09:00

agent_runs.py

Handle incomplete detached agent streams

2026-06-01 16:54:11 +09:00

agent_tools.py

Add SSRF-guarded web fetch agent tool

2026-06-01 16:57:28 +09:00

ai_interaction.py

feat(ai): add OpenRouter and Ollama Cloud providers (#231 )

2026-06-01 14:26:10 +09:00

api_key_manager.py

API keys: skip undecryptable entries on load

2026-06-02 20:28:26 +09:00

app_helpers.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

app_initializer.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

assistant_log.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

auth_helpers.py

Attribute API-token sessions to the token owner (effective_user) (#871 )

2026-06-02 11:39:01 +09:00

bg_jobs.py

chore: use explicit utf-8 for shell job files (#820 )

2026-06-02 11:12:13 +09:00

bg_monitor.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

builtin_actions.py

Scope memory consolidation by owner group

2026-06-02 12:40:28 +09:00

builtin_mcp.py

Add native Windows compatibility layer

2026-06-01 15:09:47 +09:00

caldav_sync.py

Fix duplicate CalDAV sync UIDs

2026-06-01 02:17:43 +00:00

chat_handler.py

Enforce owner checks for upload attachments

2026-06-01 16:47:48 +09:00

chat_helpers.py

Models: add Z.AI coding endpoint and GLM vision detection

2026-06-02 20:59:17 +09:00

chat_processor.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

chroma_client.py

fix: ChromaDB unreachable blocks app startup for 30-60s (#326 ) (#476 )

2026-06-01 22:22:41 +09:00

cleanup_service.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

config.py

Add native Windows compatibility layer

2026-06-01 15:09:47 +09:00

constants.py

Align SearXNG fallback URL

2026-06-01 10:50:07 +09:00

context_compactor.py

Preserve system messages during context compaction

2026-06-01 23:10:58 +09:00

database.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

deep_research.py

Research: report empty search provider results clearly

2026-06-02 20:34:25 +09:00

document_actions.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

document_processor.py

Documents: strip PDF marker without corrupting text

2026-06-02 20:35:27 +09:00

email_thread_parser.py

Email: recognize forwarded message dividers

2026-06-02 20:32:56 +09:00

embeddings.py

Add native Windows compatibility layer

2026-06-01 15:09:47 +09:00

endpoint_resolver.py

Background tasks: respect active session model fallback

2026-06-02 20:57:42 +09:00

event_bus.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

exceptions.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

goal_based_extractor.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

integrations.py

Secure by default uplift (#511 )

2026-06-01 22:30:07 +09:00

llm_core.py

Polish email and cookbook flows

2026-06-02 22:42:07 +09:00

markitdown_runtime.py

Add optional markitdown extraction for Office/EPUB documents (#766 )

2026-06-02 11:28:52 +09:00

mcp_manager.py

fix: add Browser MCP connection diagnostics (#662 )

2026-06-02 11:50:17 +09:00

memory_vector.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

memory.py

Fix AttributeError on bullet lines in extract_memory_from_chat (#873 )

2026-06-02 11:46:06 +09:00

model_context.py

Models: prefer longest known context match

2026-06-02 20:33:09 +09:00

model_discovery.py

Improve Ollama setup and model endpoint handling

2026-06-01 10:00:15 +09:00

pdf_form_doc.py

Harden PDF document markers against cross-owner upload access (#445 )

2026-06-01 22:38:14 +09:00

pdf_forms.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

pdf_runtime.py

Show a clear message when PyMuPDF is missing

2026-06-01 18:27:17 +09:00

personal_docs.py

Docs: respect path boundary when clearing exclusions

2026-06-02 20:35:44 +09:00

preset_manager.py

Presets: fill missing built-in defaults on load

2026-06-02 20:32:08 +09:00

prompt_security.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

rag_manager.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

rag_singleton.py

Re-enable VectorRAG init with lazy retry

2026-06-01 14:32:13 +09:00

rag_vector.py

fix(rag): use a stable hash for document IDs so dedup survives restarts (#1098 )

2026-06-02 22:42:23 +09:00

rate_limiter.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

request_models.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

research_handler.py

Research: add configurable run timeout

2026-06-02 20:57:57 +09:00

research_utils.py

fix: deep research discards valid sources mentioning cookies/copyright (#481 )

2026-06-01 22:26:37 +09:00

secret_storage.py

Add native Windows compatibility layer

2026-06-01 15:09:47 +09:00

session_actions.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

settings_scrub.py

Deep-scrub secrets from public settings

2026-06-01 23:11:50 +09:00

settings.py

Research: add configurable run timeout

2026-06-02 20:57:57 +09:00

task_endpoint.py

Odysseus v1.0

2026-05-31 23:58:26 +09:00

task_scheduler.py

Tasks: ship email boundary task paused by default

2026-06-02 20:53:02 +09:00

teacher_escalation.py

harden(teacher): treat escalation trace as untrusted data (#275 )

2026-06-01 14:31:39 +09:00

text_helpers.py

Text: strip dangling think blocks after visible text

2026-06-02 20:36:37 +09:00

tool_execution.py

Tools: restrict app_api and serve_preset to admins

2026-06-02 20:29:47 +09:00

tool_implementations.py

Notes: parse natural-language due dates on update

2026-06-02 20:51:16 +09:00

tool_index.py

Polish email and cookbook flows

2026-06-02 22:42:07 +09:00

tool_parsing.py

fix(agent): map native google_search and surface empty rounds

2026-06-02 12:57:45 +09:00

tool_schemas.py

Polish email and cookbook flows

2026-06-02 22:42:07 +09:00

tool_security.py

Tools: restrict app_api and serve_preset to admins

2026-06-02 20:29:47 +09:00

topic_analyzer.py

Topics: hydrate session history before analysis

2026-06-02 20:44:27 +09:00

upload_handler.py

Uploads: write uploads index atomically

2026-06-02 20:51:39 +09:00

visual_report.py

Fix visual report chapter navigation (#505 )

2026-06-01 22:26:13 +09:00

webhook_manager.py

Webhook: block IPv6 SSRF bypasses

2026-06-02 20:28:12 +09:00

youtube_handler.py

YouTube: enforce comment fetch timeout while waiting

2026-06-02 20:44:24 +09:00