Removing one RAG directory destroyed the whole shared ChromaDB collection
(all owners + base index) instead of just that directory's chunks. Shared
root cause: PersonalDocsManager.remove_directory called rebuild_index()
(delete_collection + recreate) then re-indexed only the remaining tracked
dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory
that should have been used was itself broken (where={"source":{"$contains":dir}}
selects nothing on scalar metadata and would over-delete siblings), and the
dead do_manage_rag path fired a second unconditional rebuild.
- VectorRAG.remove_directory: select chunks in Python by a path-boundary match
on the stored absolute `source` (dir or dir+os.sep), abspath-normalized.
Keys on `source` (always written), never `owner` -- no migration.
- PersonalDocsManager.remove_directory: call the targeted remove instead of
rebuild_index() + partial reindex.
- do_manage_rag (dead code): drop the second rebuild_index() (hygiene).
- rag_server.py add path: abspath so indexed `source` matches the remove.
No schema change. Prevents future wipes (does not recover already-wiped
vectors). Adds hermetic regression tests at three layers.
Fixes #1660
Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>
This commit is contained in:
@@ -306,18 +306,17 @@ class PersonalDocsManager:
|
||||
# Refresh the index to exclude the removed directory
|
||||
self.refresh_index()
|
||||
|
||||
# If RAG manager is available, we should rebuild the index
|
||||
# This is a simple approach - in production you might want more sophisticated removal
|
||||
# Targeted delete of just this directory's chunks. This previously
|
||||
# called rag_manager.rebuild_index(), which delete+recreates the
|
||||
# entire shared collection (every owner + the base index) and then
|
||||
# re-indexed only the remaining tracked dirs — ownerless and never
|
||||
# personal_dir — a catastrophic wipe (#1660). remove_directory now
|
||||
# removes exactly this directory's chunks and leaves the rest intact.
|
||||
if self.rag_manager:
|
||||
try:
|
||||
logger.info("Rebuilding RAG index after directory removal")
|
||||
self.rag_manager.rebuild_index()
|
||||
# Re-index remaining directories
|
||||
for dir_path in self.indexed_directories:
|
||||
if os.path.exists(dir_path):
|
||||
self.rag_manager.index_personal_documents(dir_path)
|
||||
self.rag_manager.remove_directory(directory)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to rebuild RAG index: {e}")
|
||||
logger.error(f"Failed to remove directory from RAG index: {e}")
|
||||
else:
|
||||
logger.info(f"Directory not in index: {directory}")
|
||||
|
||||
|
||||
Reference in New Issue
Block a user