18 Commits

Author SHA1 Message Date
Alexandre Teixeira
3426e0cb5e fix(tests): isolate session route import stubs
Keeps src.request_models real and restores both sys.modules and parent routes.session_routes package attributes after temporary test stubs. Restores one focused part of the Python CI baseline tracked in #2580.
2026-06-04 21:05:52 +01:00
nubs
050283c145 fix(mcp): confine oauth file paths (#2272) 2026-06-04 19:10:23 +02:00
Kenny Van de Maele
8bfd79fe8e chore: deduplicate src/search modules (cache, content, query) into shims (#2506)
* chore: dedupe src/search/cache.py into a re-export shim

src/search/cache.py was a byte-identical copy of services/search/cache.py.
Convert it to a sys.modules alias of the canonical services module (matching
src/search/core.py, providers.py, ranking.py) so the two cannot drift, and add
an identity assertion to test_search_module_consolidation.py.

content.py and query.py are intentionally left as-is: the copies have drifted
and services lacks fixes that src has, so they need services reconciled first
before they can be shimmed safely.

* chore: dedupe src/search content.py and query.py into shims

Convert src/search/content.py and query.py to sys.modules aliases of the
canonical services/search/* (matching cache.py, core.py, providers.py,
ranking.py) so the duplicate copies cannot drift.

Repoint the two tests that were coupled to the src-copy internals onto the
canonical services surface (behaviour is equivalent):
- test_src_search_query_nonstring.py: import services.search.query instead of
  loading the src file by path.
- test_security_regressions.py::test_web_fetch_guard_blocks_redirect_into_private:
  mock httpx.get (services uses the module-level get, not httpx.Client) and
  assert on the canonical 'Blocked' message.

Drop the now-redundant [src_content, service_content] parametrization in
test_search_content_extraction_parity.py and test_search_content_url_guards.py
(after the shim both params are the same object); add content/query identity
assertions to test_search_module_consolidation.py.
2026-06-04 18:10:55 +02:00
Joeseph Grey
fa1fe7f866 security: sanitize rendered research-report HTML (#364)
The visual research report is assembled from LLM output over crawled web
pages (untrusted content) and served under a relaxed `script-src
'unsafe-inline'` CSP. Two values reached that HTML without sanitization:

- `_md_to_html` rendered the report markdown via python-markdown, which
  passes raw HTML through verbatim, so `<script>` / `<img onerror>` /
  `<svg onload>` / `javascript:` links carried in crawled content ran in
  the app origin.
- `category` (from the /api/research/start request body, no enum check) was
  interpolated raw into `<body class="category-{category}">`.

Allowlist-sanitize the rendered markdown with nh3, keeping the formatting
the report emits (tables, code, details/summary, toc anchors, codehilite
classes, external-link target/rel) while dropping active content, and
html.escape the category. Adds regression tests.
2026-06-04 13:42:49 +01:00
Wes Huber
b30f02a3f0 fix(tests): align broken test assertions with current behavior (#1791)
* fix(tests): align broken test assertions with current behavior

- test_readme_native_quickstart_uses_loopback: README warning text
  moved from --host prefix to bind-to phrasing; update assertion
- test_sanitize_merges_consecutive_user_messages: consecutive user
  messages ARE merged and orphan tool messages ARE dropped by the
  adjacency repair pass; update expected counts and values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests): update cookbook status poll assertion for stopped state

The cookbookRunning.js ternary now handles a 'stopped' status
alongside 'error', so the exact string match in the test no longer
holds. Relax the assertion to check for the error branch presence
instead of the full ternary expression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-03 14:12:17 +09:00
Refuse
323f027865 Security: sanitize export and gallery filenames
Co-authored-by: RefuseOdd <refuseodd@users.noreply.github.com>
2026-06-02 20:29:56 +09:00
mechramc
493c815371 Chat: scope active document fallbacks by owner 2026-06-02 20:29:27 +09:00
lolwuttav
c99193041a fix(cookbook): default Ollama serve to loopback (#872) 2026-06-02 12:27:04 +09:00
Rasmus
e73f3edc06 fix: scope chat active-document lookup to the session owner (#569) 2026-06-02 11:46:40 +09:00
tanmayraut45
c1df31fda5 Honor AUTH_ENABLED=false in route-level auth gate (#785)
#622 reported "I cant even paste that hash pw and granted So auth_en
=false & localbypass= true But then the host still is showing login
page?" — the operator turned auth off in .env and still gets bounced
to /login on every page load. The flow:

The auth middleware in app.py is correctly gated on AUTH_ENABLED, so
the middleware itself does not install when AUTH_ENABLED=false. The
SPA front-end at static/app.js wraps window.fetch and redirects to
/login on ANY 401 response from any API call. So all it takes for the
operator to see a login page is one route-level 401.

src/auth_helpers.require_user — the shared FastAPI dependency mounted
on ~50 routes (email, contacts, personal, …) — was the source. It is
documented as defense-in-depth in case the middleware was bypassed
unexpectedly (SSRF from a sibling service), but the implementation
treated AUTH_ENABLED=false as one of those unexpected bypasses and
401'd anyway. The loopback fall-through that would have admitted the
operator does not fire under docker compose / a reverse proxy because
the container sees the request arriving from the bridge gateway
(172.x.x.x), not 127.0.0.1.

require_user now short-circuits to "" when AUTH_ENABLED=false so the
explicit operator opt-out reaches the route layer too. While in the
file, also mirror LOCALHOST_BYPASS=true the same way for loopback
callers — the middleware already lets them through, and routes 401'ing
the same caller would produce the same /login bounce. Non-loopback
callers under LOCALHOST_BYPASS are still rejected, matching the
middleware's _is_trusted_loopback check.

Add three focused regression tests in tests/test_security_regressions.py:
docker-bridge caller is admitted under AUTH_ENABLED=false, loopback
caller is admitted under LOCALHOST_BYPASS=true, LAN caller under
LOCALHOST_BYPASS=true is still rejected. The existing
test_require_user_rejects_unauthenticated and
test_require_user_accepts_loopback_when_unconfigured tests continue to
pass because neither sets AUTH_ENABLED, so the AUTH_ENABLED=true
default path is unchanged.

Closes #622.
2026-06-02 11:23:47 +09:00
Duarte Antunes
448401a0fc Harden PDF document markers against cross-owner upload access (#445)
Route PDF lookups through UploadHandler.resolve_upload, reject poisoned pdf_source markers on document create/update, and add regression tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-01 22:38:14 +09:00
Alexander Kenley
3c6b084f08 Secure by default uplift (#511)
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 22:30:07 +09:00
Jamieson O'Reilly
171c29dcf3 Fix email-thread HTML injection, attachment path traversal, and missing authz (#475)
Hardens issues found in a security review of the current tree (separate from
the cookbook SSH PR):

- Email thread rendering (static/js/emailLibrary.js): the flat read path runs
  inbound HTML through the allowlist sanitizer, but the two threaded paths
  (_renderTurnsAsBubbles / _renderTurnsFromServer — the default view) injected
  server-parsed `body_html` raw into the DOM. A crafted inbound email could
  inject arbitrary markup (phishing/form/credential-capture/tracking; full XSS
  if a deployment relaxes the script CSP). Now sanitized on all paths.

- Attachment extraction (routes/email_routes.py, routes/email_helpers.py): the
  on-disk extraction dir was `ATTACHMENTS_DIR / f"{folder}_{uid}"` with
  user-controlled folder/uid and no containment, so a folder like `../../tmp`
  could escape ATTACHMENTS_DIR. New attachment_extract_dir() flattens both to a
  single safe segment and asserts containment.

- Diagnostics routes (routes/diagnostics_routes.py): /api/db/stats,
  /api/rag/stats, /api/test/youtube, /api/test-research relied only on the
  global session check (any logged-in user). Now require_admin-gated.

- Defense-in-depth HTML escaping: session HTML export escapes the session name
  (routes/session_routes.py); the MCP OAuth page escapes the reflected Host
  header / server_id (routes/mcp_routes.py).

- Internal-tool token now compared with secrets.compare_digest (constant time)
  in core/middleware.py and app.py.

Adds regression tests in tests/test_security_regressions.py.
2026-06-01 22:20:17 +09:00
Rifqi Akram
5b1e56407b Add SSRF-guarded web fetch agent tool
* feat(web-fetch): add web_fetch tool to read a specific URL's content

* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution

Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.
2026-06-01 16:57:28 +09:00
Duarte Antunes
e77d87fa80 Enforce owner checks for upload attachments 2026-06-01 16:47:48 +09:00
pewdiepie-archdaemon
8f93d44917 Validate internal tool owner attribution 2026-06-01 15:25:15 +09:00
pewdiepie-archdaemon
0888a3b3e6 Add native Windows compatibility layer 2026-06-01 15:09:47 +09:00
pewdiepie-archdaemon
e5c99a5eee Odysseus v1.0 2026-05-31 23:58:26 +09:00