Files
odysseus/requirements.txt
Joeseph Grey fa1fe7f866 security: sanitize rendered research-report HTML (#364)
The visual research report is assembled from LLM output over crawled web
pages (untrusted content) and served under a relaxed `script-src
'unsafe-inline'` CSP. Two values reached that HTML without sanitization:

- `_md_to_html` rendered the report markdown via python-markdown, which
  passes raw HTML through verbatim, so `<script>` / `<img onerror>` /
  `<svg onload>` / `javascript:` links carried in crawled content ran in
  the app origin.
- `category` (from the /api/research/start request body, no enum check) was
  interpolated raw into `<body class="category-{category}">`.

Allowlist-sanitize the rendered markdown with nh3, keeping the formatting
the report emits (tables, code, details/summary, toc anchors, codehilite
classes, external-link target/rel) while dropping active content, and
html.escape the category. Adds regression tests.
2026-06-04 13:42:49 +01:00

46 lines
1.4 KiB
Plaintext

fastapi
uvicorn
python-multipart
python-dotenv
httpx
pydantic>=2.0
pydantic-settings>=2.0
SQLAlchemy
pypdf
beautifulsoup4
charset-normalizer
numpy
# Vector store + local embeddings for RAG, semantic memory, and tool
# selection. Used on core agent paths, so installed by default — the app
# still degrades to keyword fallback if they're ever missing.
# chromadb-client is the lightweight HTTP client (talks to a standalone
# ChromaDB service); fastembed runs local ONNX embeddings.
chromadb-client
fastembed
youtube-transcript-api
# Markdown rendering for research reports (src/visual_report.py).
# Imported at module-top so it's a hard core dep, not optional.
markdown
# HTML sanitizer for rendered research reports (src/visual_report.py). Report
# content is untrusted (LLM output over crawled pages) and report pages run
# under a relaxed CSP, so the rendered HTML is allowlist-sanitized.
nh3
# Calendar .ics import/export (routes/calendar_routes.py).
icalendar
# Recurrence rule expansion for calendar events (routes/calendar_routes.py).
# Imported directly as dateutil.rrule — make it explicit even though caldav
# pulls it in transitively.
python-dateutil
# CalDAV sync (src/caldav_sync.py). Handles PROPFIND discovery + REPORT
# fetch across Radicale, Nextcloud, Apple, Fastmail; we'd be reinventing
# the protocol without it.
caldav
cryptography
bcrypt
mcp
pyotp
qrcode[pil]
croniter
pytest
pytest-asyncio