Commit Graph

7 Commits

Author SHA1 Message Date
ghreprimand
8c4ea484a9 Cap inline attachment context across files (#1498)
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-03 14:23:43 +09:00
Wes Huber
49885ff9e7 fix(documents): use strip_pdf_content_marker instead of lstrip for PDF auto-open (#1727)
lstrip("\n[PDF content]:") treats the argument as a character set,
not a prefix, so it chews into the following [Page N text]: marker —
e.g. turning [Page 1 text]: into "age 1 text]:". The correct helper
strip_pdf_content_marker (which uses removeprefix) already exists in
the same file and is used by other call sites.

Fixes #1663

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 13:30:04 +09:00
SurprisedDuck
78747b56ca Documents: strip PDF marker without corrupting text
_process_pdf prepends "\n\n[PDF content]:" to extracted text, and two
call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:").
str.lstrip(chars) treats its argument as a *set of characters*, so it keeps
eating into the page text that follows the marker — e.g. a body starting
with "to the board" loses its leading "to" because 't'/'o' are in the
marker's character set. Replace both sites with a shared
strip_pdf_content_marker() helper that uses str.removeprefix.
2026-06-02 20:35:27 +09:00
Marius Oppedal Ringsby
f58fbc8b85 Add optional markitdown extraction for Office/EPUB documents (#766)
Office documents were dropped server-side: .docx fell through to
"[Attached document file]", .xlsx/.pptx weren't recognized at all, and
the personal-docs RAG index only covered txt/md/json/pdf.

Wire the optional markitdown dependency (MIT, Microsoft) into both the
chat-attachment path (build_user_content) and the RAG indexer
(personal_docs), converting .docx/.xlsx/.pptx/.xls/.epub to Markdown.
It is lazy-imported with graceful fallback (mirrors src/pdf_runtime.py):
without it those formats show an "install to extract" banner and the
MIT core is unaffected. pypdf stays the default PDF path.

- src/markitdown_runtime.py: optional-dep loader + convert_to_markdown
- upload_handler: recognize Office/EPUB extensions + MIME types
- document_processor: extract Office docs in the chat else-branch
- personal_docs: index Office docs (DEFAULT_EXTENSIONS + dispatch)
- requirements-optional.txt + ACKNOWLEDGMENTS.md: pinned markitdown 0.1.5
- tests: markitdown_runtime + office index coverage

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 11:28:52 +09:00
Duarte Antunes
e77d87fa80 Enforce owner checks for upload attachments 2026-06-01 16:47:48 +09:00
Juan Pablo Jiménez
4a04068818 Fix vision attachment timeout and stale cache
Increase local vision model timeout and avoid caching transient VL failure placeholders.\n\nCloses #202.
2026-06-01 02:04:46 +00:00
pewdiepie-archdaemon
e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00