The visual research report is assembled from LLM output over crawled web
pages (untrusted content) and served under a relaxed `script-src
'unsafe-inline'` CSP. Two values reached that HTML without sanitization:
- `_md_to_html` rendered the report markdown via python-markdown, which
passes raw HTML through verbatim, so `<script>` / `<img onerror>` /
`<svg onload>` / `javascript:` links carried in crawled content ran in
the app origin.
- `category` (from the /api/research/start request body, no enum check) was
interpolated raw into `<body class="category-{category}">`.
Allowlist-sanitize the rendered markdown with nh3, keeping the formatting
the report emits (tables, code, details/summary, toc anchors, codehilite
classes, external-link target/rel) while dropping active content, and
html.escape the category. Adds regression tests.
46 lines
1.4 KiB
Plaintext
46 lines
1.4 KiB
Plaintext
fastapi
|
|
uvicorn
|
|
python-multipart
|
|
python-dotenv
|
|
httpx
|
|
pydantic>=2.0
|
|
pydantic-settings>=2.0
|
|
SQLAlchemy
|
|
pypdf
|
|
beautifulsoup4
|
|
charset-normalizer
|
|
numpy
|
|
# Vector store + local embeddings for RAG, semantic memory, and tool
|
|
# selection. Used on core agent paths, so installed by default — the app
|
|
# still degrades to keyword fallback if they're ever missing.
|
|
# chromadb-client is the lightweight HTTP client (talks to a standalone
|
|
# ChromaDB service); fastembed runs local ONNX embeddings.
|
|
chromadb-client
|
|
fastembed
|
|
youtube-transcript-api
|
|
# Markdown rendering for research reports (src/visual_report.py).
|
|
# Imported at module-top so it's a hard core dep, not optional.
|
|
markdown
|
|
# HTML sanitizer for rendered research reports (src/visual_report.py). Report
|
|
# content is untrusted (LLM output over crawled pages) and report pages run
|
|
# under a relaxed CSP, so the rendered HTML is allowlist-sanitized.
|
|
nh3
|
|
# Calendar .ics import/export (routes/calendar_routes.py).
|
|
icalendar
|
|
# Recurrence rule expansion for calendar events (routes/calendar_routes.py).
|
|
# Imported directly as dateutil.rrule — make it explicit even though caldav
|
|
# pulls it in transitively.
|
|
python-dateutil
|
|
# CalDAV sync (src/caldav_sync.py). Handles PROPFIND discovery + REPORT
|
|
# fetch across Radicale, Nextcloud, Apple, Fastmail; we'd be reinventing
|
|
# the protocol without it.
|
|
caldav
|
|
cryptography
|
|
bcrypt
|
|
mcp
|
|
pyotp
|
|
qrcode[pil]
|
|
croniter
|
|
pytest
|
|
pytest-asyncio
|