Fix RAG remove_directory wiping the entire shared collection (#1660) (#1734)

Removing one RAG directory destroyed the whole shared ChromaDB collection
(all owners + base index) instead of just that directory's chunks. Shared
root cause: PersonalDocsManager.remove_directory called rebuild_index()
(delete_collection + recreate) then re-indexed only the remaining tracked
dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory
that should have been used was itself broken (where={"source":{"$contains":dir}}
selects nothing on scalar metadata and would over-delete siblings), and the
dead do_manage_rag path fired a second unconditional rebuild.

- VectorRAG.remove_directory: select chunks in Python by a path-boundary match
  on the stored absolute `source` (dir or dir+os.sep), abspath-normalized.
  Keys on `source` (always written), never `owner` -- no migration.
- PersonalDocsManager.remove_directory: call the targeted remove instead of
  rebuild_index() + partial reindex.
- do_manage_rag (dead code): drop the second rebuild_index() (hygiene).
- rag_server.py add path: abspath so indexed `source` matches the remove.

No schema change. Prevents future wipes (does not recover already-wiped
vectors). Adds hermetic regression tests at three layers.

Fixes #1660

Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
This commit is contained in:
Ethan
2026-06-03 14:29:51 +10:00
committed by GitHub
parent 9964e9f3fb
commit 0e538ecd29
5 changed files with 199 additions and 21 deletions

View File

@@ -105,7 +105,9 @@ async def call_tool(name: str, arguments: dict) -> list[TextContent]:
directory = _dir.strip() if isinstance(_dir, str) else ""
if not directory:
return [TextContent(type="text", text="Error: add_directory needs a directory path")]
directory = os.path.expanduser(directory)
# Store an absolute path so indexed `source` metadata is absolute and
# remove_directory (which abspath-normalizes) can match it later (#1660).
directory = os.path.abspath(os.path.expanduser(directory))
if not os.path.isdir(directory):
return [TextContent(type="text", text=f"Error: Directory not found: {directory}")]
if not _rag_manager: