security: sanitize rendered research-report HTML (#364)
The visual research report is assembled from LLM output over crawled web
pages (untrusted content) and served under a relaxed `script-src
'unsafe-inline'` CSP. Two values reached that HTML without sanitization:
- `_md_to_html` rendered the report markdown via python-markdown, which
passes raw HTML through verbatim, so `<script>` / `<img onerror>` /
`<svg onload>` / `javascript:` links carried in crawled content ran in
the app origin.
- `category` (from the /api/research/start request body, no enum check) was
interpolated raw into `<body class="category-{category}">`.
Allowlist-sanitize the rendered markdown with nh3, keeping the formatting
the report emits (tables, code, details/summary, toc anchors, codehilite
classes, external-link target/rel) while dropping active content, and
html.escape the category. Adds regression tests.
This commit is contained in:
@@ -21,6 +21,10 @@ youtube-transcript-api
|
||||
# Markdown rendering for research reports (src/visual_report.py).
|
||||
# Imported at module-top so it's a hard core dep, not optional.
|
||||
markdown
|
||||
# HTML sanitizer for rendered research reports (src/visual_report.py). Report
|
||||
# content is untrusted (LLM output over crawled pages) and report pages run
|
||||
# under a relaxed CSP, so the rendered HTML is allowlist-sanitized.
|
||||
nh3
|
||||
# Calendar .ics import/export (routes/calendar_routes.py).
|
||||
icalendar
|
||||
# Recurrence rule expansion for calendar events (routes/calendar_routes.py).
|
||||
|
||||
Reference in New Issue
Block a user