Commit Graph

68 Commits

Author SHA1 Message Date
Mahdi Salmanzade
bc00a9fc7f fix(security): fail closed on null-owner session in sync-chat endpoint (#870)
POST /api/v1/chat (the n8n/Make/Activepieces sync-chat endpoint) verified
session ownership with `_tok_user and _sess_owner and _sess_owner != _tok_user`.
The `_sess_owner and` clause skipped the check entirely whenever the session's
owner was null — so any chat-scoped API token (e.g. a token minted for a paired
mobile device) could pass a legacy/migrated null-owner session id, inject a
message into that session, and read back its conversation history plus reuse
the owner's endpoint credentials.

This is the same `if owner and owner != user` null-owner-bypass pattern that
was already hardened in the gallery, calendar, and notes routes (see
test_null_owner_gates.py) and in session_routes._verify_session_owner. Make
this gate strict and fail closed too: require a resolvable caller and an exact
owner match, mirroring _verify_session_owner. Extract the decision into
_caller_owns_session() and pin it with regression tests.
2026-06-02 11:38:05 +09:00
James Arslan
6776c7d691 Surface silent model fallback instead of masking it (#868)
When the selected model fails before producing output, stream_llm_with_fallback
quietly switches to the next candidate and the reply is shown under the
originally selected model's name, so a misconfigured provider looks like it
works. (Concretely: a Bedrock gateway that 400s every Anthropic/Claude request
appears fine because another model silently answers under the Claude label.)

Emit a `fallback` SSE event ({selected_model, answered_by, reason}) the first
time a non-primary candidate produces output, forward it through the agent loop
and both chat-route paths, stamp the response metrics with the model that
actually answered, and show a notice + relabel the reply in the UI.

Tested: python -m pytest tests/test_llm_core_fallback.py (3 pass);
python -m py_compile src/llm_core.py src/agent_loop.py routes/chat_routes.py;
node --check static/js/chat.js.
2026-06-02 11:37:25 +09:00
Tatlatat
2d6b777799 fix(cookbook): diagnose 'no GGUF file' serve failures clearly (#811) (#866)
When serving with the llama.cpp backend and no .gguf file exists on the host,
the GGUF launcher prelude exits with 'ERROR: No GGUF found on this host', but
_diagnose_serve_output had no matching pattern, so the UI showed a generic
crash instead of explaining the cause. Add a diagnosis pattern for the
no-GGUF case so users are told a .gguf is required and pointed at downloading
a GGUF build, instead of an opaque crash.

Closes #811
2026-06-02 11:36:53 +09:00
Ernest Hysa
360bc83a66 fix(history): scope topic analysis to authenticated owner only (#744)
Two changes close the cross-tenant topic leak in /api/conversations/topics.

The route at routes/history_routes.py:478 used get_current_user, which
returns None when no auth middleware has set request.state.current_user
(loopback-bypass, AUTH_ENABLED=false, or any path that short-circuits the
middleware). It then forwarded owner=None to analyze_topics.

The helper at src/topic_analyzer.py:21 used an 'if owner:' short-circuit
in its owner filter, so the None owner took the no-filter path and the
helper silently aggregated topic frequencies and per-snippet session_id,
session_name, role, and snippet text across every user's sessions.

analyze_topics now returns an empty result when owner is falsy. The
inner short-circuit is removed because the filter is now strict by
construction. The route is switched to require_user, which raises 401
when auth_manager.is_configured is True and the caller is anonymous,
matching the pattern used by calendar_routes, skills_routes, and other
authenticated routes.

The test test_history_topics_owner_scope.py was rewritten to drive the
real route through FastAPI's TestClient with a stub AuthMiddleware that
mirrors the loopback-bypass branch, and now asserts a strict 401 from
the route and an empty result from the helper. The previous version of
the test accepted either a 200-with-empty-topics or a 401; the strict
assertion means a future regression that drops the require_user wrapper
or re-adds the inner short-circuit is caught immediately.
2026-06-02 11:36:01 +09:00
hawktuahs
a2f6183c4a Fix cookbook pip installs in venvs (#723) 2026-06-02 11:31:59 +09:00
tanmayraut45
0e31c38be0 Support in-place endpoint updates and recover empty-model sessions (#786)
The "don't wipe endpoint_url/model on endpoint delete" half of #587 landed
in 6a78b02 (Fix endpoint model preservation for tasks). The three remaining
follow-up pieces from the original PR — flagged in the review on #786 —
are:

- routes/model_routes.py: toggle_model_endpoint (PATCH) now accepts
  api_key and base_url, so the admin UI can rotate a key or fix a typo'd
  URL without going through delete+recreate. base_url is normalized the
  same way the POST handler does (strip /models, /chat/completions,
  /completions, /v1/messages, then _normalize_base). Cache invalidation
  matches the POST/DELETE paths and the response includes base_url so the
  frontend can confirm what was saved.

- routes/chat_routes.py: new _recover_empty_session_model picks
  cached_models[0] from the endpoint that matches sess.endpoint_url and
  persists it onto the Session row before the LLM call goes out. Wired
  into both /api/chat and /api/chat_stream after the existing
  _clear_orphaned_session_endpoint guard, so the order is: drop
  truly-orphaned sessions first, then heal the "picker showed it, session
  never knew" case.

- routes/chat_routes.py: when recovery fails (no endpoint, no cached
  models) raise HTTP 400 with a clear message instead of letting
  model="" reach the upstream as 401/503.

Closes #587.
2026-06-02 11:26:38 +09:00
Tatlatat
63a947d246 fix(cookbook): mark zero-file HF downloads as failed instead of completed (#839) (#865)
A Cookbook download whose repo/quant selector matched no files (e.g. a
':Q4_K_M' tag that does not exist) printed 'Fetching 0 files' and was still
reported as a successful '✓ Downloaded' / completed task. Detect the
zero-file signature in the download snapshot and mark the task as an error
with a clear diagnosis (no matching files — check the repo or quant/filename
pattern) so users know nothing was actually downloaded. Normal multi-file
and fully-cached downloads (which print 'Fetching N files', N>0) are
unaffected.

Closes #839
2026-06-02 11:24:34 +09:00
Ernest Hysa
f4aef0dcf7 fix(skills): scope skill reads to caller owner (#777)
read_skill_md and read_skill_reference walk all skill files via
_iter_skill_files and return the first match by slug, regardless
of owner. In a multi-user deployment where two users have skills
with the same slug under different categories, a caller scoped
to owner='alice' can read Bob's skill content.

This is the same cross-tenant leak class as the update_skill /
delete_skill fix (PR #755, merged), but on the read path.

Changes:
- read_skill_md / read_skill_reference accept owner= param (default
  None = match ownerless only, matching the write-path convention).
- 7 callers updated: tool_implementations.py (view, view_ref, patch),
  builtin_actions.py (test_skills), skills_routes.py (audit, source,
  test routes).
- Tests: read scoping (alice reads hers, not bob's), positive update
  scoping (alice can mutate her own), ownerless-match default.
2026-06-02 11:21:27 +09:00
Mahdi Salmanzade
4a84a895a0 Keep reasoning (thinking) tokens out of the saved chat reply (#856)
Streamed deltas flagged thinking:true (reasoning-model traces) were being folded
into full_response and persisted as part of the assistant message, so saved
replies were polluted with the model's chain-of-thought. Forward those deltas to
the client (for a live thinking indicator) but exclude them from the accumulated
saved reply, in both chat and research-stream paths. Mirrors the existing rewrite
path's handling.
2026-06-02 11:17:41 +09:00
Rolly Calma
f65c89e02e chore: use explicit utf-8 for shell job files (#820) 2026-06-02 11:12:13 +09:00
LittleLlama
54ecfa39cf Provider detection: match by hostname instead of substring (re #768) (#815)
* Dedupe URL routing helpers and tighten adjacent hostname checks

* Match providers by hostname, not substring, in _detect_provider

_detect_provider used `"anthropic.com" in url`-style substring checks, so a URL
that merely contained a provider's domain in its path or query — or a look-alike
host like `anthropic.com.example` — was misclassified and picked the wrong
auth-header/payload shape. Switch it to the existing `_host_match` helper
(hostname exact/subdomain match), the same way the human-readable labels and
curated model lists already work, finishing that migration. Also harden
`_host_match` against trailing-dot FQDNs.

Not a credential-leak fix: _detect_provider only classifies a URL the admin
already configured next to its key, and the URL — not this function — decides
where the request goes. This is a correctness/consistency cleanup.

Adds tests that import the real helpers (test_endpoint_resolver.py tests local
copies, so it can't catch this) covering the substring false-positives.

Refs #768.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Import build_headers under its real name in model_routes

It was imported as `build_headers as _provider_headers`, which collides with
the unrelated llm_core._provider_headers(provider, headers) — same name,
different signature. Use the real name to remove the confusion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Use hostname matching in URL builders, not raw suffix checks

PR review flagged that _detect_provider() was hardened to match on
hostname, but several helpers still used raw host.endswith("anthropic.com")
/ host.endswith("ollama.com"), which match adjacent hosts like
notanthropic.com / notollama.com.

Route the remaining checks through _host_match(): _is_ollama_native_url
and _ollama_api_root in llm_core, and _anthropic_api_root / _ollama_api_root
in endpoint_resolver. With _detect_provider already hostname-correct, the
trailing "or host.endswith(...)" clauses in build_chat_url / build_models_url
are redundant, so drop them rather than fix the substring match in place.

Add builder-level tests asserting look-alike and domain-in-path hosts route
to the OpenAI-compatible default. They import the real builders and fail on
the pre-fix code.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 11:11:17 +09:00
wundervrc
3f6d630b56 Never resolve to a disabled endpoint model (#861)
Background tasks (e.g. the Email Tags / check_email_urgency action)
resolve their model through resolve_endpoint("utility") → Default Chat.
When the configured model is one the user has since disabled on the
endpoint, the resolver still dispatched to it — on Groq that surfaces as
every email failing with "HTTP 400: model ... requires terms acceptance".

Two paths fed this:
- The auto-pick fallback selected from cached_models without excluding
  the endpoint's hidden_models, so a disabled model listed first won.
- A stale default_model left pointing at a now-disabled model (seeded at
  endpoint registration from raw model_ids[0]) was used verbatim.

Fix resolve_endpoint / resolve_endpoint_by_id to drop a configured model
that's in hidden_models and to pick the first ENABLED chat model. Also
seed default_model on registration via _first_chat_model so we never pin
the global default to an embedding/tts entry a provider lists first.

Checks: python -m pytest tests/test_endpoint_resolver.py
        tests/test_model_routes.py tests/test_model_context.py (all pass);
        python -m py_compile app.py routes/model_routes.py
        src/endpoint_resolver.py.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 11:10:43 +09:00
pewdiepie-archdaemon
6a78b02976 Fix endpoint model preservation for tasks 2026-06-02 09:44:24 +09:00
PewDiePie
d60ff44c1b Merge pull request #797 from ErnestHysa/fix/research-path-traversal
fix(research): validate session_id to block path traversal
2026-06-02 09:42:23 +09:00
PewDiePie
7187118aa6 Merge pull request #782 from tanmayraut45/fix/active-streams-toctou
Fix TOCTOU race in chat stream status endpoint
2026-06-02 09:42:07 +09:00
pewdiepie-archdaemon
96618b01c0 Polish task UI slash commands and Ollama serving 2026-06-02 09:36:03 +09:00
pewdiepie-archdaemon
ab0a480f30 Show Ollama models in Cookbook Serve 2026-06-02 07:38:45 +09:00
Ernest Hysa
cb6f6b65ea fix(research): validate session_id to block path traversal
Every research endpoint interpolates session_id into filesystem paths
(Path('data/deep_research') / f'{session_id}.json') without checking
for traversal sequences. A crafted ID like '../../data/auth' reaches
arbitrary JSON files — readable via research_detail (which also leaks
file paths in error messages), writable via research_archive, and
deletable via research_delete.

Add _validate_session_id() which rejects anything outside
[a-zA-Z0-9-]{1,128}. Called before filesystem access in all 12
endpoints that accept a session_id path parameter.
2026-06-01 23:25:38 +01:00
tanmayraut45
2d7d7b2412 Fix TOCTOU race in chat stream status endpoint
The /api/chat/stream_status handler did a membership test against
_active_streams followed by an indexed read of the same key. Between
those two ops, a sibling stream's finally block (or a stop / cleanup
path) can pop the entry, turning the indexed read into a KeyError that
bubbles up as a 500. The race is the exact one _stream_set was already
written to avoid; the comment on the helper at the top of the module
spells out why a single .get() is the right pattern here too.

Collapse the two-step into a single .get() call so the lookup either
returns the live record or None, and report 'detached' / 404 based on
that single read. No behavior change on the happy path; the failure
mode under concurrent stream cleanup is now handled deterministically.

Closes #658.
2026-06-02 03:02:30 +05:30
Alexandre Teixeira
5dd5847d4b Revoke stale sessions after password change
After a successful password change, revoke all browser sessions for the
same user except the one that submitted the request. This prevents stale
sessions on other devices from remaining valid after credentials are
updated.

Keep API-token behavior unchanged. The current browser session is
preserved so the user can continue from the tab that changed the
password.

Add focused regression tests for preserving the current session, revoking
other sessions, persisting revocation, and avoiding revocation when the
current password is incorrect.
2026-06-02 05:59:22 +09:00
Alexandre Teixeira
26483661da Restrict provider discovery to admins
Require admin access before serving provider discovery data from
GET /api/providers. This prevents normal authenticated users from
triggering provider discovery or receiving cached provider host data.

Keep GET /api/models available to normal users and leave the existing
admin-only GET /api/discover behavior unchanged.

Add a focused regression test to ensure unauthorized callers cannot
trigger discovery and cannot receive cached provider data.
2026-06-02 05:54:40 +09:00
Prakhya
a96593a99b Improve Ollama endpoint error messages 2026-06-02 05:53:50 +09:00
Collin
70a71f603c Scope email calendar extraction to account owner
The email auto-calendar pass (settings.email_auto_calendar / the
extract_email_events task) scans recently received mail and lets an LLM
create / update / cancel calendar events. Two problems made it a cross-tenant,
remotely triggerable hole:

1. No owner scoping. _auto_summarize_pass(account_id=None) fans out over EVERY
   enabled account of EVERY user. For each message it fetched an upcoming-events
   snapshot with NO owner filter (all tenants' events) and handed those uids +
   titles to the extraction LLM, then executed the model's ops via
   do_manage_calendar(...) with owner=None. do_manage_calendar only filters by
   owner when owner is not None, so create/update/delete ran across ALL users'
   calendars. Net: every user's event titles/times were disclosed to the model,
   and the model could cancel/move/duplicate any tenant's events by uid.

2. No prompt-injection wrapping. The raw email From/Subject/body were
   interpolated straight into an instruction-shaped extraction prompt (unlike
   the chat path, which wraps external text via src/prompt_security). Anyone
   who can email a user whose instance has auto-calendar enabled could inject
   operations: create attacker-controlled "meeting" events (the path even
   auto-harvests URLs from the body into the event location/description — a
   phishing primitive) or cancel/modify the victim's real events, with zero
   human in the loop.

Fix:
- Add core.database.get_upcoming_events(owner) and use it for the snapshot, so
  the LLM only ever sees the processed account owner's events.
- Look up the EmailAccount owner in _auto_summarize_pass_single and pass owner=
  to every do_manage_calendar call, so create/update/delete are scoped to that
  user (owner=None stays the single-user / legacy escape hatch).
- Tell the extraction model the email is untrusted data and not to follow
  instructions inside it (defense-in-depth against injection).

Add tests/test_calendar_owner_scope.py: get_upcoming_events returns only the
given owner's events (and everything when owner is None). Fails against the old
unscoped query.
2026-06-01 23:12:32 +09:00
Collin
11c2931efb Run auth password work off the event loop
* fix: run bcrypt off the event loop in auth routes

The auth routes are async, but each bcrypt call ran synchronously on the event
loop. bcrypt (checkpw/hashpw) is intentionally CPU-expensive (~100-300 ms), so
every login / signup / setup / change-password froze the single event loop for
that window, stalling all other in-flight requests (chat streams, polling, ...).

/api/auth/login is the worst case: it is reachable unauthenticated, runs bcrypt
twice (verify_password, then create_session re-verifies), and is rate-limited
only per-IP. A burst of login attempts serializes the whole server — cheap
DoS amplification.

Offload the bcrypt-bearing AuthManager calls (setup, signup/create_user,
login's verify_password + create_session, change_password) via
asyncio.to_thread, matching how the codebase already offloads blocking work
(e.g. src/builtin_actions._run_subprocess, email summarize). The event loop
stays responsive while bcrypt runs on a worker thread.

Add tests/test_auth_event_loop.py: asserts login runs verify_password and
create_session on a worker thread, not the loop thread. Fails if those calls
are awaited inline again.

* test: isolate auth event-loop test from heavy core/* import chain

The regression test imported routes.auth_routes, which pulls in
core.auth and so triggers core/__init__.py — transitively importing
src.llm_core (hangs at import under the project venv) and the SQLAlchemy
declarative models (metaclass error on a bare core.database import / under
the conftest sqlalchemy stubs). Reported by the maintainer: collection
failed on system Python and hung under the venv.

Stub core.auth/core.database before the import, mirroring the existing
_ensure_stub pattern in test_auth_regressions.py and test_null_owner_gates.py.
AuthManager is only a type hint here and the handler is exercised with a
MagicMock, so no real core machinery is needed. Test now imports cleanly
and passes in <0.3s without bcrypt/sqlalchemy installed.
2026-06-01 23:12:12 +09:00
kanaru-dev
a51a1fc4fc Deep-scrub secrets from public settings
/api/auth/settings is auth-exempt (the frontend + the pre-login page read it for
keybinds/TTS prefs), so non-admin and unauthenticated callers get a scrubbed
copy. The previous scrub only blanked TOP-LEVEL string values whose key matched a
short suffix list — so a secret nested under a non-secret parent key, or stored
under a key outside the list, would leak. A real exposure when the app is
reachable over a Cloudflare tunnel / reverse proxy.

- src/settings_scrub.py: NEW stdlib-only module with the scrub helpers (deep/
  recursive; broadened secret-key patterns). Kept separate from auth_routes so it
  imports + unit-tests WITHOUT pulling the FastAPI / auth / database chain
  (addresses review: the test no longer fails at collection on the DB import).
- routes/auth_routes.py: import scrub_settings from the module.
- tests/test_settings_scrub.py: import the tiny module directly.

Ran: pytest tests/test_settings_scrub.py (8 passed); verified the test pulls no
db/auth modules into sys.modules; py_compile routes/auth_routes.py.

Co-authored-by: Kanaru92 <107661007+Kanaru92@users.noreply.github.com>
2026-06-01 23:11:50 +09:00
ooovenenoso
5e47e69e99 Allow serving cached local llama.cpp models
Co-authored-by: Kevin <120500656+oooindefatigable@users.noreply.github.com>
2026-06-01 23:10:08 +09:00
Yizreel Schwartz Sipahutar
42380a8693 Keep Cookbook POSIX paths stable on Windows hosts 2026-06-01 23:08:39 +09:00
pewdiepie-archdaemon
f2d55f8726 Fix cached GGUF model metadata in Cookbook Serve 2026-06-01 22:46:54 +09:00
pewdiepie-archdaemon
743c074b2e Harden Cookbook package SSH probe 2026-06-01 22:44:34 +09:00
pewdiepie-archdaemon
e5b927597e Fix Cookbook serve exit code reporting 2026-06-01 22:41:25 +09:00
spooky
15822e91ff fix: keep serve preflight errors visible (#398) 2026-06-01 22:40:06 +09:00
spooky
4b72dd407b fix: report serve dependency readiness (#412) 2026-06-01 22:39:36 +09:00
Duarte Antunes
448401a0fc Harden PDF document markers against cross-owner upload access (#445)
Route PDF lookups through UploadHandler.resolve_upload, reject poisoned pdf_source markers on document create/update, and add regression tests.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-01 22:38:14 +09:00
red person
b2e8d692a4 Scope personal RAG uploads by owner (#446) 2026-06-01 22:36:53 +09:00
red person
d36896c5f7 Gate image editor AI endpoints by privilege (#447) 2026-06-01 22:35:24 +09:00
Filip
92a81480f7 feat: allow memory import without session (#493) 2026-06-01 22:32:17 +09:00
Carlos Arroyo
00320972dc fix: CUDA/GPU detection for vLLM and llama.cpp in Docker (#479)
Two bugs caused GPU inference to silently fall back to CPU inside the
Odysseus Docker container even when the GPU was correctly passed through.

## entrypoint.sh — CUDA_HOME detection only covered CUDA 13.x wheels

The nvcc glob only searched
vidia/cu13, which matches the

vidia-nvcc-cu13 pip wheel layout. CUDA 12.x wheels install nvcc to

vidia/cuda_nvcc/bin/nvcc (nvidia-cuda-nvcc-cu12) or
vidia/cu12
(nvidia-nvcc-cu12) — completely different paths. The glob found nothing,
so CUDA_HOME was never set.

Worse, VLLM_USE_FLASHINFER_SAMPLER=0 was inside the same if-block, so it
was never set either. vLLM then tried to JIT-compile the FlashInfer
sampler at startup, failed with 'Could not find nvcc', and crashed — even
though the GPU was fully visible to the container.

Fix: expand the search to also check nvidia/cu12 and nvidia/cuda_nvcc.
Move VLLM_USE_FLASHINFER_SAMPLER=0 to an unconditional export after the
loop (it is sampler-only, no impact on the attention path, and the correct
setting for any container where CUDA headers may be incomplete).

## cookbook_routes.py — llama.cpp Linux source build silently fell back to CPU

The cmake invocation was:
  cmake -B build -DGGML_CUDA=ON 2>/dev/null || cmake -B build

2>/dev/null suppressed all configure errors. When nvcc is absent (the
slim base image has no CUDA toolkit — intentional), cmake fails silently,
then the || fallback re-runs without -DGGML_CUDA=ON. A CPU-only binary is
produced with no warning. Additionally, a stale CMakeCache.txt from the
failed CUDA attempt was reused (no rm -rf build), poisoning the next
configure run. The macOS branch already did rm -rf build for exactly this
reason; the Linux branch did not.

Fix: before cmake, detect pip-installed nvcc across the same three path
patterns as entrypoint.sh and expose it via CUDA_HOME/PATH. If nvcc is
found, run a clean CUDA build with full error visibility. If not, fall
back to a CPU build with an explicit warning telling the user how to get
a GPU build (install vLLM via Cookbook -> Dependencies, which brings the
CUDA wheels including nvcc, then re-launch).

## .env.example — document Windows COMPOSE_FILE separator

Added a comment showing the semicolon separator required on Windows
Docker Desktop alongside the existing colon-separator (Linux) example.
2026-06-01 22:30:51 +09:00
Alexander Kenley
3c6b084f08 Secure by default uplift (#511)
Co-authored-by: Alex Kenley <Alex.Kenley@threatvectorsecurity.com>
2026-06-01 22:30:07 +09:00
Jamieson O'Reilly
171c29dcf3 Fix email-thread HTML injection, attachment path traversal, and missing authz (#475)
Hardens issues found in a security review of the current tree (separate from
the cookbook SSH PR):

- Email thread rendering (static/js/emailLibrary.js): the flat read path runs
  inbound HTML through the allowlist sanitizer, but the two threaded paths
  (_renderTurnsAsBubbles / _renderTurnsFromServer — the default view) injected
  server-parsed `body_html` raw into the DOM. A crafted inbound email could
  inject arbitrary markup (phishing/form/credential-capture/tracking; full XSS
  if a deployment relaxes the script CSP). Now sanitized on all paths.

- Attachment extraction (routes/email_routes.py, routes/email_helpers.py): the
  on-disk extraction dir was `ATTACHMENTS_DIR / f"{folder}_{uid}"` with
  user-controlled folder/uid and no containment, so a folder like `../../tmp`
  could escape ATTACHMENTS_DIR. New attachment_extract_dir() flattens both to a
  single safe segment and asserts containment.

- Diagnostics routes (routes/diagnostics_routes.py): /api/db/stats,
  /api/rag/stats, /api/test/youtube, /api/test-research relied only on the
  global session check (any logged-in user). Now require_admin-gated.

- Defense-in-depth HTML escaping: session HTML export escapes the session name
  (routes/session_routes.py); the MCP OAuth page escapes the reflected Host
  header / server_id (routes/mcp_routes.py).

- Internal-tool token now compared with secrets.compare_digest (constant time)
  in core/middleware.py and app.py.

Adds regression tests in tests/test_security_regressions.py.
2026-06-01 22:20:17 +09:00
Abhinav
9e8de43f25 fix: clear session headers on endpoint deletion (#477) 2026-06-01 22:19:54 +09:00
pewdiepie-archdaemon
5ed9b74cd0 Polish email tasks and window controls 2026-06-01 20:56:46 +09:00
Miles
df7d32c70c Require document privilege for PDF imports 2026-06-01 18:28:15 +09:00
red person
2f87dbcfbc Show a clear message when PyMuPDF is missing 2026-06-01 18:27:17 +09:00
Rifqi Akram
5b1e56407b Add SSRF-guarded web fetch agent tool
* feat(web-fetch): add web_fetch tool to read a specific URL's content

* test(web-fetch): add SSRF coverage and fail closed on empty DNS resolution

Add explicit SSRF regression tests for the web_fetch path covering
loopback, private LAN ranges, link-local/metadata, IPv6 private/local,
redirect-into-private, and unsupported schemes. Harden _public_http_url
to fail closed when a hostname resolves to no addresses.
2026-06-01 16:57:28 +09:00
Daniel Grzelak
92c2392fd6 Clarify Docker dependency status inside containers
* fix: show docker as N/A inside the container

* test: cover in-container docker detection

* fix: make the N/A dependency chip legible

* refactor: make remote docker applicability explicit and tested
2026-06-01 16:56:42 +09:00
Duarte Antunes
e77d87fa80 Enforce owner checks for upload attachments 2026-06-01 16:47:48 +09:00
Strahil Peykov
3cbfa98c30 Scope session auto-sort changes to current user 2026-06-01 15:18:25 +09:00
Strahil Peykov
8253422832 Allow installing vLLM from cookbook dependencies 2026-06-01 15:18:17 +09:00
pewdiepie-archdaemon
0888a3b3e6 Add native Windows compatibility layer 2026-06-01 15:09:47 +09:00
John Chaplin
f1817fd560 Add macOS Apple Silicon Cookbook support
* Add Apple Silicon (Metal) GPU detection and unified-memory fit tuning

hardware.py detects Apple Silicon locally and over SSH, reporting
backend=metal, the chip name, and a RAM-scaled fraction of unified
memory as the usable GPU budget. fit.py gains an M1-M4 memory-bandwidth
table for realistic tok/s and drops vLLM-only formats (AWQ/GPTQ/FP8)
that can't be served on Metal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 32ac81dbc680361463a088dae867d555d5a79c3b)

* Generate macOS/Metal serve commands and surface the Metal GPU

cookbook_routes.py adds a macOS serve path (Ollama, Metal-aware
llama.cpp build using `sysctl hw.ncpu` instead of `nproc`, and a clear
error if vLLM is attempted). The frontend defaults Metal serving to
llama.cpp and offers llama.cpp/Ollama instead of vLLM/SGLang. The
odysseus-cookbook CLI's `gpus` command reports the Metal GPU via
sysctl/vm_stat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 4ba01ce25d256ae032029898f361c824a34fcd4b)

* Add launchd LaunchAgent for macOS (systemd equivalent)

com.odysseus.ui.plist + install-service-macos.sh run Odysseus at login
and restart on crash, the macOS counterpart to odysseus-ui.service. The
installer auto-fills paths from the venv, so there's no hand-editing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 3d4b6b2c7b8b31af32201ed278115df9a559dea9)

* Document macOS install (brew, Ollama, AirPlay port, launchd)

README + setup.py cover the Homebrew / Apple Silicon path: brew install
python@3.11 tmux ollama, Metal serving via Ollama/llama.cpp, the launchd
service, and the macOS AirPlay Receiver conflict on ports 7000/5000.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 8dc9a3578a1726f070ed9f75c0958ae291a6d966)

* Add downloadable macOS launcher app builder

build-macos-app.sh generates dist/Odysseus.app and a drag-to-Applications
dist/Odysseus.dmg. The app starts the local server from this repo's venv and
opens the UI in a chrome-less app window (Chromium --app mode, falling back to
the default browser). It's a launcher wrapper — it drives the venv rather than
bundling Python — so the install path is baked in at build time.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
(cherry picked from commit 7927940c3810ee34640803b198d334a6ac93474d)

* Harden macOS Cookbook support: hide MLX, fix Metal build cache

Builds on the adopted PR #213 macOS/Metal work with two fixes and tests:

- fit.py: always drop MLX-quantized models. Odysseus only generates serve
  commands for llama.cpp/Ollama (Metal) and vLLM/SGLang (CUDA); MLX needs the
  mlx_lm runtime and the catalog's MLX repos ship no GGUF alternative, so they
  were surfaced on Apple Silicon but could never be served.
- cookbook_routes.py (macOS branch only): `rm -rf build` before configure so a
  poisoned CMakeCache from a prior failed CUDA attempt can't make every later
  build fail; explicit -DCMAKE_BUILD_TYPE=Release; a clear "brew install cmake"
  hint if cmake is missing. Linux/CUDA path unchanged.
- tests/test_hwfit_macos.py: MLX hidden on metal, MLX still hidden on CUDA
  (regression guard), Metal detection on Apple Silicon, and skipped on
  Linux/Intel (proves non-macOS detection is untouched).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Propagate unified_memory flag and document macOS GPU/Docker caveat

- hardware.py: detect_system now carries the unified_memory flag from GPU
  detection into the system dict (it was set by _detect_apple_silicon / AMD-APU
  detection but dropped during result assembly, so the API always reported
  null). Lets callers distinguish unified from discrete VRAM.
- README: prominent warning that Docker on Apple Silicon can't reach the Metal
  GPU (runs a Linux VM) — Cookbook must run natively for GPU serving; fix stale
  text that said Cookbook recommends MLX models (now hidden as unservable).
- test: detect_system propagates unified_memory.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Put Odysseus's venv bin on PATH for cookbook runners

Native (non-Docker) installs run from a virtualenv whose bin holds the `hf` CLI
and `python3` the cookbook download/serve tmux scripts shell out to. Those
scripts start in a fresh login shell with the venv NOT activated, so on a native
macOS install `hf download` failed with "hf: command not found" — and the
`pip --user` self-heal missed because macOS has no bare `pip` command.

- cookbook_helpers.py: _local_tooling_path_export() — pure helper returning a
  PATH export for the running interpreter's bin dir (escaped for double quotes).
- cookbook_routes.py: download + serve runners prepend that dir on local runs
  (gated off SSH/Windows); swap the `pip` install fallbacks to `python3 -m pip`.
- tests: helper output for normal and spaced paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Document macOS llama.cpp serving prerequisites

Clarify the two serving paths on Apple Silicon: the recommended zero-build
route (brew install llama.cpp ships a Metal llama-server Cookbook finds on PATH),
and the from-source fallback, which requires cmake + Xcode Command Line Tools.
Without those the build is skipped and serving silently degrades to a slow CPU
build, so new users now know to install them (or use the prebuilt) up front.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Recommend only GGUF-servable models on Metal

Apple Silicon's only serving engines are llama.cpp and Ollama, both GGUF-only
(vLLM/SGLang are CUDA/ROCm and don't run on macOS). The catalog tags raw
safetensors repos with a default Q4_K_M quant, so the fit-ranking was
recommending ~397/501 models that have no GGUF and fail to serve on Metal with
"No GGUF found" (e.g. microsoft/Phi-mini-MoE-instruct).

Drop any model without a real GGUF (is_gguf/gguf_sources) on Apple Silicon —
subsumes the previous AWQ/GPTQ/FP8 special-case into one rule. On CUDA these
stay visible since vLLM serves safetensors directly. Metal recommendations go
501 -> 104, all actually servable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Remove macOS launchd LaunchAgent (cherry-picked extra)

Drop the launchd service from the PR #213 cherry-picks: the
install-service-macos.sh installer, the com.odysseus.ui.plist template, and the
README section documenting them. Tangential to the core Cookbook/Metal support
and not wanted. The build-macos-app.sh launcher is kept.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Add one-command macOS quick start (start-macos.sh)

Running Odysseus natively on a Mac previously meant ~7 manual terminal steps
(brew deps, venv, activate, pip, setup.py, uvicorn with the right port) — not
friendly for a generic macOS user, and the native run is required because Docker
on macOS can't reach the Metal GPU.

- start-macos.sh: installs Homebrew deps (python@3.11, tmux, prebuilt Metal
  llama.cpp), creates the venv, installs requirements, runs setup, and launches
  on a non-AirPlay port (7860). Idempotent; re-run to start again.
- README: the Apple Silicon section now leads with this one-command quick start
  and the clickable .app, with engine/port/manual details folded into a
  collapsible block. Added a pointer at the top of the manual-install section.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* macOS quick start: auto-open browser when ready

The "open this URL" line scrolled out of view as uvicorn kept logging after it,
so users missed it. Now start-macos.sh waits (in the background) until the
server accepts connections, prints a boxed "ready" banner at that point (i.e.
after the startup burst, not before), and opens the URL in the default browser
automatically. Skippable with ODYSSEUS_NO_OPEN=1 for headless/SSH use.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Don't assume/force a specific Python version on macOS

The README claimed "system Python is 3.9" — a machine-specific generalization
that's often wrong (macOS ships no recent Python by default; many users already
have 3.11+). Make it generic, and make start-macos.sh detect an existing
Python 3.11+ and use it, only installing python@3.11 when none is found instead
of forcing it on top of the user's Python.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Align start-macos.sh venv path with build-macos-app.sh

start-macos.sh created the environment in .venv/, but build-macos-app.sh and
the manual install steps use venv/ — so the clickable .app wouldn't reuse the
quick-start's environment and would rebuild a second one. Use venv/ everywhere.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* README: state clearly that MLX is unsupported on Apple Silicon

Odysseus has no mlx_lm runtime; it serves GGUF (llama.cpp/Ollama) and CUDA
(vLLM/SGLang) only. MLX-only models can't run on a Mac and are hidden from
Cookbook — make that explicit in both the quick start and the details.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* start-macos.sh: build the venv with an arm64 Python on Apple Silicon

A clean-room run surfaced this: with a universal2/x86 Python (e.g. the
python.org installer under /usr/local), the venv's compiled extensions install
as arm64 but get loaded as x86_64 when launched from the .app bundle, so it
crashes with "incompatible architecture (have arm64, need x86_64)". The terminal
run happened to work only because a universal binary defaults to arm64 there.

On Apple Silicon, look only under /opt/homebrew (arm64-only) for the build
Python, and install Homebrew's python@3.11 if none is present — so the venv is
arm64-only and launches correctly from both the terminal and the .app. Intel
and non-mac paths are unchanged. Verified end-to-end in a clean clone: .app now
boots on Metal with no arch error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Address dev-exp review: macOS setup robustness + doc/UX fixes

From the voltagent dev-exp review of the branch:
- README: fix broken anchor links (the em-dash heading produced a slug the links
  didn't match); simplify the heading to a stable slug.
- cookbook_routes.py: add /opt/homebrew/bin and /usr/local/bin to the serve PATH
  so a brew-installed llama-server/ollama is found instead of falling back to a
  slow source build.
- start-macos.sh: guard against an empty Python path; fail fast with a clear
  message on port-in-use; ERR trap with a "safe to re-run" message; show pip
  progress (drop --quiet on the slow requirements install); stop the background
  browser-opener cleanly on exit/Ctrl+C (no orphaned poller).
- setup.py: bind hint to 127.0.0.1; suppress the manual run-hint when launched
  by start-macos.sh (ODYSSEUS_SKIP_RUN_HINT) so the URL isn't contradictory.
- build-macos-app.sh: the .app only opens the browser once the server is
  actually ready (not after the readiness timeout).
- cookbookServe.js: drop "Diffusers" from the Metal backend picker —
  diffusion_server.py is CUDA-only, so it was an unservable option on macOS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: yunggilja <yunggilja@gmail.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 14:59:19 +09:00