The visual research report is assembled from LLM output over crawled web
pages (untrusted content) and served under a relaxed `script-src
'unsafe-inline'` CSP. Two values reached that HTML without sanitization:
- `_md_to_html` rendered the report markdown via python-markdown, which
passes raw HTML through verbatim, so `<script>` / `<img onerror>` /
`<svg onload>` / `javascript:` links carried in crawled content ran in
the app origin.
- `category` (from the /api/research/start request body, no enum check) was
interpolated raw into `<body class="category-{category}">`.
Allowlist-sanitize the rendered markdown with nh3, keeping the formatting
the report emits (tables, code, details/summary, toc anchors, codehilite
classes, external-link target/rel) while dropping active content, and
html.escape the category. Adds regression tests.
* fix(tests): align broken test assertions with current behavior
- test_readme_native_quickstart_uses_loopback: README warning text
moved from --host prefix to bind-to phrasing; update assertion
- test_sanitize_merges_consecutive_user_messages: consecutive user
messages ARE merged and orphan tool messages ARE dropped by the
adjacency repair pass; update expected counts and values
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(tests): update cookbook status poll assertion for stopped state
The cookbookRunning.js ternary now handles a 'stopped' status
alongside 'error', so the exact string match in the test no longer
holds. Relax the assertion to check for the error branch presence
instead of the full ternary expression.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#622 reported "I cant even paste that hash pw and granted So auth_en
=false & localbypass= true But then the host still is showing login
page?" — the operator turned auth off in .env and still gets bounced
to /login on every page load. The flow:
The auth middleware in app.py is correctly gated on AUTH_ENABLED, so
the middleware itself does not install when AUTH_ENABLED=false. The
SPA front-end at static/app.js wraps window.fetch and redirects to
/login on ANY 401 response from any API call. So all it takes for the
operator to see a login page is one route-level 401.
src/auth_helpers.require_user — the shared FastAPI dependency mounted
on ~50 routes (email, contacts, personal, …) — was the source. It is
documented as defense-in-depth in case the middleware was bypassed
unexpectedly (SSRF from a sibling service), but the implementation
treated AUTH_ENABLED=false as one of those unexpected bypasses and
401'd anyway. The loopback fall-through that would have admitted the
operator does not fire under docker compose / a reverse proxy because
the container sees the request arriving from the bridge gateway
(172.x.x.x), not 127.0.0.1.
require_user now short-circuits to "" when AUTH_ENABLED=false so the
explicit operator opt-out reaches the route layer too. While in the
file, also mirror LOCALHOST_BYPASS=true the same way for loopback
callers — the middleware already lets them through, and routes 401'ing
the same caller would produce the same /login bounce. Non-loopback
callers under LOCALHOST_BYPASS are still rejected, matching the
middleware's _is_trusted_loopback check.
Add three focused regression tests in tests/test_security_regressions.py:
docker-bridge caller is admitted under AUTH_ENABLED=false, loopback
caller is admitted under LOCALHOST_BYPASS=true, LAN caller under
LOCALHOST_BYPASS=true is still rejected. The existing
test_require_user_rejects_unauthenticated and
test_require_user_accepts_loopback_when_unconfigured tests continue to
pass because neither sets AUTH_ENABLED, so the AUTH_ENABLED=true
default path is unchanged.
Closes#622.
Route PDF lookups through UploadHandler.resolve_upload, reject poisoned pdf_source markers on document create/update, and add regression tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
Hardens issues found in a security review of the current tree (separate from
the cookbook SSH PR):
- Email thread rendering (static/js/emailLibrary.js): the flat read path runs
inbound HTML through the allowlist sanitizer, but the two threaded paths
(_renderTurnsAsBubbles / _renderTurnsFromServer — the default view) injected
server-parsed `body_html` raw into the DOM. A crafted inbound email could
inject arbitrary markup (phishing/form/credential-capture/tracking; full XSS
if a deployment relaxes the script CSP). Now sanitized on all paths.
- Attachment extraction (routes/email_routes.py, routes/email_helpers.py): the
on-disk extraction dir was `ATTACHMENTS_DIR / f"{folder}_{uid}"` with
user-controlled folder/uid and no containment, so a folder like `../../tmp`
could escape ATTACHMENTS_DIR. New attachment_extract_dir() flattens both to a
single safe segment and asserts containment.
- Diagnostics routes (routes/diagnostics_routes.py): /api/db/stats,
/api/rag/stats, /api/test/youtube, /api/test-research relied only on the
global session check (any logged-in user). Now require_admin-gated.
- Defense-in-depth HTML escaping: session HTML export escapes the session name
(routes/session_routes.py); the MCP OAuth page escapes the reflected Host
header / server_id (routes/mcp_routes.py).
- Internal-tool token now compared with secrets.compare_digest (constant time)
in core/middleware.py and app.py.
Adds regression tests in tests/test_security_regressions.py.
* feat(web-fetch): add web_fetch tool to read a specific URL's content
* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution
Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.