Commit Graph

109 Commits

Author SHA1 Message Date
red person
028a39b42c Fix local Cookbook dependency installs in venvs (#1082) 2026-06-02 22:39:02 +09:00
Afonso Coutinho
5b12bf3f55 fix: ICS export doesn't escape commas/semicolons in event fields (#1161)
* fix: escape SUMMARY/LOCATION per RFC 5545 in ICS export

* fix: escape commas/semicolons in ICS DESCRIPTION, not just newlines

* test: ICS export escapes commas, semicolons, backslashes, newlines
2026-06-02 22:36:12 +09:00
red person
fd89d098a1 Chat: use cached endpoint model ids before probing 2026-06-02 21:00:58 +09:00
ooovenenoso
bd2fa82c1e Cookbook: prefer ROCm for native llama.cpp bootstrap
Co-authored-by: Kevin <120500656+oooindefatigable@users.noreply.github.com>
2026-06-02 20:59:44 +09:00
Robin Fröhlich
3c6ae3713e Models: add Z.AI coding endpoint and GLM vision detection 2026-06-02 20:59:17 +09:00
SurprisedDuck
934bca9e48 Providers: omit temperature for OpenAI reasoning models
* fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5)

These models only accept the default temperature; sending any explicit
value (even 0.0) returns HTTP 400 "Only the default (1) value is
supported". This broke two paths:

- Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so
  a perfectly valid o3/gpt-5 endpoint is reported as failing in the
  Model Endpoints health check.
- Chat/stream payloads send temperature unconditionally, so a non-default
  temperature preset 400s on these models.

The code already special-cases the same model family for
max_completion_tokens, so this adds a sibling _restricts_temperature()
helper and omits the field for those models, letting the API use its
required default. gpt-4.5 is intentionally excluded (not a reasoning
model; accepts temperature normally).

Adds tests/test_llm_core_temperature.py covering the predicate and the
synchronous payload builder.

* fix: also omit temperature for reasoning models on the direct-POST paths

The first commit only covered llm_call/llm_call_async/stream_llm and the
endpoint probe. Email auto-summary, urgency-less spam classification, the
email reply-summary endpoint, and gallery vision tagging build their
OpenAI payloads inline and POST them directly (requests/httpx), bypassing
llm_core — so a reasoning model configured there would still 400 on the
temperature field. These sites already branch on _uses_max_completion_tokens,
so they're the same class; added the matching _restricts_temperature guard.

gallery_routes also gains the max_completion_tokens branch it was missing,
so gpt-5 vision tagging works end to end.

Note: email_pollers urgency scoring goes through llm_call_async and was
already covered.
2026-06-02 20:58:33 +09:00
Tushar-Projects
c3228f8b59 Background tasks: respect active session model fallback 2026-06-02 20:57:42 +09:00
Georgiy
34c81e5b16 Auth: use require_user for remaining guarded routes 2026-06-02 20:55:50 +09:00
Leo
6c15dc7d33 Chat metrics: surface backend generation speed
* Chat metrics: show backend's true generation t/s, not tokens÷wall-clock

The per-message tokens/sec read low and felt wrong because it was computed as
output_tokens / total_duration, where total_duration is wall-clock including
prefill, tool calls, and network — not pure decode time. llama.cpp already
reports the correct gen speed in its stream (timings.predicted_per_second), but
it was being dropped.

- llm_core.py: when parsing the OpenAI-compatible usage chunk, also read the
  sibling `timings` block llama.cpp includes — pass predicted_per_second through
  as gen_tps and prompt_per_second as prefill_tps on the usage event.
- agent_loop.py: capture backend_gen_tps/backend_prefill_tps from usage events;
  in _compute_final_metrics prefer backend_gen_tps over the wall-clock division
  when present (fall back to computed for cloud APIs that omit timings). Tag the
  result with tps_source ("backend" vs "computed") and surface prefill_tps.

Result: the displayed t/s now matches the model's real decode speed and is
stable regardless of prompt length (a long prefill no longer deflates it).

Checks: py_compile passes; verified extraction against a real llama.cpp final
chunk (gen 79 t/s surfaced vs the deflated wall-clock figure shown before).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Chat metrics: surface true t/s on the direct-chat path too

Follow-up to the gen-tps work: the non-agent direct-chat stream path in
chat_routes turned the raw `usage` event straight into a metrics event but only
copied token counts — it never set tokens_per_second or response_time. So simple
(non-tool) replies showed "Speed: n/a" / "Time: undefineds" and the chip fell
back to a bare token count ("27 tok") instead of t/s.

Map the usage event's gen_tps (llama.cpp timings.predicted_per_second, added in
the prior commit) into tokens_per_second here too, tag tps_source=backend, and
set response_time from wall-clock for the stats popup.

Checks: py_compile passes; verified llama.cpp emits usage+timings on the final
stream chunk (gen ~90 t/s) that this path consumes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Tests: backend gen/prefill t/s passthrough and preference

Cover the two pieces of the true-t/s metric so it can be reviewed on its own:
- stream_llm surfaces llama.cpp's timings.predicted_per_second /
  prompt_per_second as gen_tps / prefill_tps on the usage event (captured
  llama.cpp final-chunk fixture), and omits them when the backend reports no
  timings.
- _compute_final_metrics prefers backend_gen_tps over output/wall-clock,
  tags tps_source ("backend" vs "computed"), and surfaces prefill_tps.

Reuses the fake-client stream harness from test_llm_core_streaming.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 20:52:08 +09:00
ghreprimand
4cec31d988 Chat: route image sessions only to matching image endpoints
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:52:03 +09:00
Shaw
db10c8d95b Sessions: allow deleting memory-only ghost sessions
A session that exists only in the in-memory SessionManager — never persisted,
or whose DB row was removed out-of-band — was listed by GET /api/sessions (the
list is built from the in-memory manager) but 404'd on every per-session
operation, so it could never be deleted.

Two causes, both fixed:

1. _verify_session_owner() only consulted the DB and raised 404 when no row
   existed. It now falls back to the in-memory session's owner when (and only
   when) a session_manager is supplied and the caller actually owns the ghost.
   The DB row stays authoritative when present, and a ghost owned by another
   user still 404s, so the ownership/security model is unchanged. The new
   parameter defaults to None, preserving behavior for all other callers.

2. SessionManager.delete_session() only removed the in-memory entry when a DB
   row was found, so memory-only ghosts survived. It now drops the in-memory
   copy regardless and reports success when either the DB row or the in-memory
   entry was removed.

Added tests/test_session_ghost_delete.py covering both layers, including the
cross-owner 404, the unauthenticated 403, DB-row-wins precedence, and backward
compatibility when no manager is passed.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:51:26 +09:00
Yavor Ivanov
7cc8fdb2f5 Models: avoid hidden models in default fallback
Both get_default_chat and _recover_empty_session_model picked the
first model from cached_models[0] without checking hidden_models.
If the first cached model was hidden (e.g. minimax-m3), it was
returned as the default or used to repair empty session models,
even though the model list endpoints already filter hidden_models.

- Add _visible_models() helper that filters cached_models by
  hidden_models (mirrors the filtering in list_model_endpoints)
- Use _visible_models() in get_default_chat fallback (when no
  explicit default_model is saved)
- Use _visible_models() in _recover_empty_session_model (when
  repairing a session whose model field is empty before chat send)
- Add regression tests for hidden-model filtering in default chat
  resolution, and unit tests for _visible_models helper
2026-06-02 20:37:14 +09:00
Tatlatat
bd78e1d5c2 Admin: wipe gallery albums with images
The /api/admin/wipe/gallery branch deleted GalleryImage rows but left
every GalleryAlbum row behind (GalleryAlbum wasn't even imported). After
"wipe gallery" the user is left with orphaned, empty albums whose cover_id
points at now-deleted images — inconsistent with the other wipe branches,
which clear both parent and child tables.

Delete GalleryAlbum alongside GalleryImage and include both in the
returned count.

Adds tests/test_admin_wipe_gallery.py: seeds a real in-memory SQLite DB
with an album + image, runs the actual wipe handler, and asserts both
tables are emptied. Fails before this change (albums survive).
2026-06-02 20:35:57 +09:00
SurprisedDuck
78747b56ca Documents: strip PDF marker without corrupting text
_process_pdf prepends "\n\n[PDF content]:" to extracted text, and two
call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:").
str.lstrip(chars) treats its argument as a *set of characters*, so it keeps
eating into the page text that follows the marker — e.g. a body starting
with "to the board" loses its leading "to" because 't'/'o' are in the
marker's character set. Replace both sites with a shared
strip_pdf_content_marker() helper that uses str.removeprefix.
2026-06-02 20:35:27 +09:00
Ernest Hysa
996a2027dd Cookbook: surface pip install failures in logs
_pip_install_fallback_chain silently discarded pip stderr via
2>/dev/null on every attempt. When pip failed (network error, venv
mismatch, disk full), the wrapper exited 0 and the Cookbook UI showed
the download as running — the silent-failure mode from #354.

Extract _pip_install_attempt() which wraps each pip invocation in a
bash -c subshell that captures output to a temp file, prints tail -5
on failure, cleans up, and exits with pip's real exit code. This
avoids the | tail pipefail masking (the first blocker on #363) while
surfacing the last 5 lines of pip output in the tmux log so users
can see what went wrong.

Both local wrapper and remote SSH runner use the same helper through
_pip_install_fallback_chain, so the fix is symmetric.
2026-06-02 20:34:52 +09:00
Hayk Arzumanyan
514050d098 Models: rewrite Docker loopback endpoints to host gateway
In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio
default http://localhost:1234/v1) targets the Odysseus container itself, not
the host running the server, so the probe gets a connection error and the
endpoint is rejected with a misleading 'No models found for that provider/key'.
Rewrite loopback to host.docker.internal (which compose already maps to
host-gateway) for the probe and the saved URL, mirroring the existing Ollama
handling. Gated on actually being in a container with the gateway reachable, so
native installs and gateway-less deploys are untouched.

Fixes #25

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-02 20:34:40 +09:00
Tatlatat
67517eaed1 Gallery: match image endpoint URLs with exact v1 suffix
The image-edit endpoint lookup compared stored vs incoming base URLs with
`.rstrip("/v1")`. `str.rstrip(chars)` treats its argument as a character
set, not a suffix, so any URL ending in '/', 'v', or '1' is over-stripped
(e.g. `http://host1/v1` -> `http://host`). Two endpoints that are not the
same can then compare equal, or the real endpoint fails to match its own
stored record, leaving `api_key` unset and sending the upstream image call
unauthenticated.

Use `.removesuffix("/v1")` (exact-suffix removal) with surrounding
`.rstrip("/")` on both sides so only a genuine trailing `/v1` is dropped.

Adds a focused test that parses the actual comparison expression out of
gallery_routes.py via AST and evaluates it — it fails if the fix is
reverted and uses no mocking.
2026-06-02 20:34:05 +09:00
Mahdi Salmanzade
280c29d572 Security: owner-scope v1 chat endpoint fallback
The sync-chat endpoint's Case 3 fallback selected a ModelEndpoint with an
unscoped `query(ModelEndpoint).filter(is_enabled == True).first()` and then
used that row's decrypted `api_key` for the LLM call. ModelEndpoint is a
per-user resource (owner non-null = private to that user), so a chat-scoped
API token for user A that sent no session and no api_key could fall back onto
user B's PRIVATE endpoint — spending B's API key/quota and reaching whatever
internal base_url B configured. This is the same multi-tenant owner-scoping
class already fixed for the session gate on this very endpoint
(_caller_owns_session) and for companion/models.

Scope the fallback to the token owner's own rows plus legacy null-owner
(shared) rows via the existing owner_filter helper, matching
routes/model_routes.py and companion/routes.py. A null/empty owner stays a
no-op, preserving single-user/legacy behaviour.

Add regression tests pinning the scoped fallback (cross-owner, shared-only,
no-visible-row, disabled-owned, and the legacy null-owner no-op).
2026-06-02 20:31:35 +09:00
Refuse
323f027865 Security: sanitize export and gallery filenames
Co-authored-by: RefuseOdd <refuseodd@users.noreply.github.com>
2026-06-02 20:29:56 +09:00
mechramc
493c815371 Chat: scope active document fallbacks by owner 2026-06-02 20:29:27 +09:00
Tatlatat
cd247ed107 Skills: delete owner-scoped skills with owner
The DELETE /api/skills/{skill_id} handler resolves the caller, loads the
skill with skills_manager.load(owner=user), and verifies ownership with
_verify_owner(match, user) — but then calls
skills_manager.delete_skill(match.get("name")) without the owner.

SkillsManager.delete_skill filters candidates with
`(sk.owner or "") != (owner or "")`, so when owner is None an owner-scoped
skill is skipped and the method returns False. The route then raises a
spurious 404 "Skill not found" — meaning a logged-in user can never delete
their own skills through the API.

Pass the resolved owner through to delete_skill so the skill is matched and
removed.

tests/test_skills_delete_owner.py drops a real owner-scoped SKILL.md on disk
and (1) checks the manager directly: delete_skill without owner returns
False (regression lock) while delete_skill(owner="alice") returns True and
removes the dir; (2) drives the real DELETE route handler and asserts it
returns {"ok": True} and deletes the file. The route test fails before this
change (404). Real SkillsManager + real filesystem, no mocking.
2026-06-02 20:28:36 +09:00
tanmayraut45
6c654fb0ef Models: detect bare Ollama URLs as online
_ping_endpoint() is the reachability fallback the model-endpoint POST
handler invokes when _probe_endpoint() returns no model ids. It GETs
base + "/models" and, on any sub-500 response, returns immediately with
`reachable = (status < 400)`. That early return runs before the
Ollama-native /api/version / /api/tags fallback below it.

For an Ollama URL without /v1 (the quickstart accepts both
http://localhost:11434 and http://127.0.0.1:11434, and the reporter
on #1025 explicitly tried both), the OpenAI-style probe target is
http://127.0.0.1:11434/models. Ollama returns 404 there because /models
only lives under /v1. _ping_endpoint then returned reachable=False and
the picker showed "Added (offline — will retry on next load)" on an
install that was running fine. /api/version was never tried.

Same shape for http://127.0.0.1:11434/api (the native Ollama root):
/api/models is also 404, same premature offline verdict.

_probe_endpoint() does fall through to /api/tags on a 4xx (the response
raises via raise_for_status), so the endpoint quietly recovers once
cached_models becomes non-empty on the next background refresh —
matching the second commenter's "had to disconnect manually then
reconnect for it to be detected" note. The bug is most visible while
no models are pulled yet (cached_models stays empty, _ping_endpoint
keeps voting offline).

Fix:

- Hoist the Ollama-shaped-URL test (port == 11434 or "ollama" in
  hostname — the same condition _probe_endpoint already uses) to the
  top of the function so both code paths share it.
- Stop short-circuiting on 4xx when the URL looks like Ollama: fall
  through to the existing /api/version + /api/tags reachability loop
  so an alive Ollama gets recognised even when its OpenAI surface has
  the wrong prefix for the user's input.
- Fix the `root` computation in that loop to strip a trailing /api as
  well as /v1, so http://127.0.0.1:11434/api no longer gets probed at
  /api/api/version.
- 4xx on non-Ollama hosts keeps the current semantics: a 401 from
  api.openai.com/v1/models is still a definitive offline verdict, not
  a reason to GET /api/version on OpenAI.

Closes #1025.
2026-06-02 20:27:41 +09:00
mechramc
9d0a18a5b5 Email: add explicit SMTP security mode 2026-06-02 13:15:06 +09:00
Juan Pablo Jiménez
eda99360d1 Fix Cookbook dependency install completion state
* Fix Cookbook dependency install completion state

Mark Cookbook dependency installs as complete when the background runner
exits successfully, even when HuggingFace-specific download markers are
absent.

* Add focused regression coverage for cookbook dependency completion.

Keep the fix narrowly scoped while carrying env_path through dependency tasks and locking the completion reconciliation behavior with targeted tests.
2026-06-02 12:59:29 +09:00
Mihail Filippov
3d109cbaca Add explicit open-signup state endpoint
* Refactor open registration state switching

* Rename endpoint to open-signup
2026-06-02 12:35:54 +09:00
Leo
6fca7e86b7 Cookbook serve profiles and engine filter
* Cookbook: Engine filter + intelligent hardware-computed serve profiles

Two related Cookbook serving improvements for accurate, hardware-aware model
serving (especially on consumer GPUs that can only run GGUF/llama.cpp).

Engine filter
- New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant
  picker. Pure client-side view filter over the fetched list via the same
  _detectBackend() the serve commands use, so what you filter to is exactly what
  would launch. Re-renders from cache (no refetch). Empty-state message + the
  instant-cache-paint path account for it too.

Intelligent serve profiles (Quality / Balanced / Speed)
- services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM +
  model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type,
  context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU
  instead of failing; a model that fits stays fully on GPU; quant tracks profile
  intent; vision models keep image-encoder headroom. Reuses models.py VRAM math
  so filtering and serving agree on what fits. Pure/deterministic (no t/s claims
  — partial-offload speed isn't reliably predictable; fit is what's computed).
- /api/hwfit/profiles endpoint returns the profiles + the model's trained
  context limit, with loose name matching (strips org/ prefix, -GGUF suffix,
  quant tag) so a local GGUF folder name resolves to its catalog entry.
- _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn /
  --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It
  previously only set -ngl/-c, which is why it OOM'd or ran slow.
- Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV
  Cache / Flash Attn fields. Context is clamped to the model's trained limit
  (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch —
  fixes a crash where a stale 256k/16M preset + quantized KV cache caused an
  amdgpu ErrorDeviceLost.

Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed
VRAM, context cap, launchable flags, vision headroom, no-GPU empty.
Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd
green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k,
matching hand-tuning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook: make column-header sorting discoverable (incl. Newest)

Sorting in Cookbook is via clickable column headers (pewds' design), but the
headers had no visual cue that they're interactive — so sorting in general, and
the Newest sort on the Model header specifically, was undiscoverable.

- Style sortable headers as interactive: pointer cursor, hover underline, and
  the active sort column bolded/highlighted. There was no CSS for
  .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort,
  not just Newest.
- The Model column header sorts by release_date (newest first), reusing the
  existing header-click sort wiring and the "newest" SORT_KEY.

No new sort control — uses the existing column-header paradigm.

Checks: node --check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2)

In the Serve tab the model is a specific GGUF file already on disk, so its quant
can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K"
as if you could re-quantize it. That's meaningless when serving a fixed file.

- compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE
  mode), the quant is locked to the file's and profiles differ only in the real
  serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget
  use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode
  (no override) still varies the quant to show download options.
- /api/hwfit/profiles accepts serve_weights_gb & serve_quant.
- The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from
  the repo/file name) and passes them, so profiles match what's actually served.

Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by
KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k
ncm15) — no nonsensical quant changes.

Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor

Two serve-panel additions:

1. **Vision toggle.** A "Vision" checkbox that serves the model with its
   multimodal projector so it can read images. The mmproj path is resolved at
   runtime (find mmproj-*.gguf next to the model), so dropping an mmproj file in
   the model folder makes the toggle just work; `--mmproj … --image-max-tokens
   1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found.

2. **Live GPU-memory monitor.** A readout that polls /api/cookbook/gpus every 4s
   while the panel is open and shows VRAM used/total/%, free, and — crucially on
   a discrete card — **RAM spillover** (AMD gtt_used_mb), with a plain-language
   health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise
   CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint
   (previously read for total only and discarded for 'used').

Lets you see at a glance whether a config fits VRAM (fast) or is paging to system
RAM over PCIe (slow) instead of guessing.

Checks: node --check + py_compile pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:34:42 +09:00
spooky
8b3c0d8ad4 feat: select cached gguf artifacts for serve (#891) 2026-06-02 12:32:40 +09:00
Dustin
bd3204fe96 Diagnose vLLM device detection failure with actionable suggestion (#778)
Adds a diagnosis pattern for the 'Failed to infer device type' error
vLLM raises when no CUDA or ROCm GPU is found (e.g. systems with only
integrated or Intel Xe graphics). The existing pattern only caught
'No CUDA GPUs are available' which fires later in startup; this new
entry catches the earlier device-probe failure and the NVML/amdsmi
library-not-found messages that precede it.

Surfaces in the Cookbook serve card as: "vLLM could not find a supported
GPU — switch to llama.cpp or Ollama" instead of a raw Python traceback.

Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-02 12:30:07 +09:00
IBR-41379
385c3c3cf3 fix: use sys.executable for Cookbook model cache scan on Windows (#627)
Windows has 'App Execution Aliases' that can make shutil.which('python3')
and shutil.which('python') resolve to a Microsoft Store stub instead of
real Python -- even when Python is properly installed. The stub outputs:

  'Python was not found; run without arguments to install from the
   Microsoft Store, or disable this shortcut from Settings > Apps >
   Advanced app settings > App execution aliases.'

and exits 9009, producing empty stdout. The JSON parse of the local
model cache scan then fails with 'Expecting value: line 1 column 1
(char 0)', and the Cookbook model list shows nothing.

Fix: prefer sys.executable as the interpreter for the local scan.
Odysseus already runs inside its own venv, so sys.executable always
points to the real venv Python and bypasses PATH / Store alias lookup
entirely. which_tool() is kept as a fallback.

Cross-platform: sys.executable works identically on Linux and macOS
(returns the real interpreter path), so this change is safe everywhere.
2026-06-02 12:29:40 +09:00
Rolly Calma
32efeeb3a2 chore: use running event loop in async helpers (#821) 2026-06-02 12:28:05 +09:00
lolwuttav
c99193041a fix(cookbook): default Ollama serve to loopback (#872) 2026-06-02 12:27:04 +09:00
Tatlatat
ffb77d7ff2 fix(auth): honor AUTH_ENABLED=false on owner-scoped endpoints (no /login loop) (#880)
When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still
returned 401 (api/models, api/research/*, api/email/*), so the front-end
redirected the browser to /login and the app was unusable despite auth being
turned off. require_user() in src/auth_helpers.py already documents and honors
this contract (issue #622) via 'if _auth_disabled(): return ""', but these
endpoints did their own get_current_user/is_configured check without it.

Make _require_user (research), the /api/models anti-leak guard, and
email_helpers._require_auth consult _auth_disabled() and let anonymous through
(owner='') only when the operator explicitly disabled auth. The 401 protection
is fully intact when AUTH_ENABLED=true. Verified end-to-end: with
AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.
2026-06-02 12:26:26 +09:00
Mahdi Salmanzade
66cd44b66d fix(research): gate /api/research/spinoff on session ownership (#878)
The spinoff endpoint authenticated the caller (_require_user) but never
verified the research session belonged to them before reading the
persisted report and seeding it into a new chat session owned by the
caller. Any authenticated user who knew or guessed another user's
research session ID could exfiltrate that user's full report into their
own session — a cross-user data disclosure (IDOR).

Every other endpoint in this router gates on _owns_in_memory /
_assert_owns_research right after validating the session ID; spinoff was
the lone exception. Add the same _owns_in_memory check (covers both the
in-memory task and the on-disk JSON) so a non-owner gets a 404 before any
data is read or a session is created.

Add regression tests pinning the anonymous (401) and wrong-owner (404)
cases.
2026-06-02 12:26:12 +09:00
Mahdi Salmanzade
f691537472 fix(security): stop leaking the vault master password via process argv (#879)
The /api/vault/unlock handler ran `bw` as
`_run_bw(["unlock", req.master_password, "--raw"])`. _run_bw launches it with
`asyncio.create_subprocess_exec(bw_path, *args)`, so the master password became
a process argument — readable by any local user through `ps` and
`/proc/<pid>/cmdline` for the lifetime of the unlock subprocess. The Bitwarden
master password decrypts the entire vault, so this is a serious credential
exposure on any multi-user / shared host (CWE-214).

The sibling /login handler already avoids this by feeding the password on
stdin; unlock was the outlier. Hand the password to `bw` through the
environment instead (`--passwordenv BW_PASSWORD`), mirroring how BW_SESSION is
already passed — `/proc/<pid>/environ` is readable only by the process owner,
not other local users. Add regression tests pinning that the secret reaches
the subprocess env and never appears in argv.
2026-06-02 12:25:43 +09:00
Juan Pablo Jiménez
e58e4a185d Expose Cookbook user-install CLIs in Docker (#887)
Ensure pip --user console scripts like vLLM are visible to Docker
runtime and dependency probes by adding the user install bin directory
to PATH.
2026-06-02 12:23:29 +09:00
Tatlatat
9a1893760d fix(cookbook): skip pip --user fallback inside virtualenvs (#388) (#889)
The dependency-install fallback chain unconditionally ran
'pip install --user', which fails inside a virtualenv (and as root in
LXC/containers) with 'Can not perform a --user install. User site-packages
are not visible in this virtualenv.' — even though the function's docstring
already noted --user is invalid in venvs.

Guard the --user fallback with a venv check so it only runs outside a venv
(where --user is actually valid for PEP-668 system Pythons). Derive the venv
probe interpreter from the install command (python for 'pip', python3 for
'pip3'/'python3 -m pip') so the check runs in pip's own environment. System
PEP-668 installs keep the --user fallback; venv/LXC-root installs no longer
hit the --user error. Updated the unit test for the new chain.

Closes #388
2026-06-02 12:23:20 +09:00
pewdiepie-archdaemon
966b53df77 Improve Cookbook serve diagnostics and recommendations 2026-06-02 12:15:47 +09:00
NovaUnboundAi
3319310942 Allow longer deep research extraction timeouts (#651)
Co-authored-by: NovaUnboundAi <NovaUnboundAi@users.noreply.github.com>
2026-06-02 11:50:03 +09:00
Rasmus
e73f3edc06 fix: scope chat active-document lookup to the session owner (#569) 2026-06-02 11:46:40 +09:00
elijaheck
c303a29670 Fix native macOS tailnet launch and Metal GPU probe (#756)
* macOS/Apple Silicon: detect Metal backend, surface MLX models, brew tmux hint

- hardware.py: add _detect_macos() via sysctl/system_profiler; report
  backend=metal + unified_memory on Apple Silicon instead of cpu_arm
- fit.py: add Apple Silicon (M1-M5) unified-memory bandwidths + metal
  FALLBACK_K so throughput estimates use the real bandwidth formula
- setup.py: Mac-specific 'brew install tmux' hint

Verified on M5 Pro 48GB: backend=metal, 273GB/s matched, 6 MLX models now
visible (were hidden), cuda still hides MLX, no new test failures.

* Fix native macOS tailnet launch and Metal GPU probe

---------

Co-authored-by: Elijah (Hermes) <hermes@local>
2026-06-02 11:41:04 +09:00
Mahdi Salmanzade
54ac4a74fb Attribute API-token sessions to the token owner (effective_user) (#871)
Split 2/4 of the companion bridge (#863 was 1/4). A paired bearer-token caller
runs as the sandboxed 'api' pseudo-user, so its sessions were stranded in a
separate 'api'-owned silo, invisible to the owner's desktop UI.

Add effective_user(): for a bearer token it resolves to the token's real owner
(request.state.api_token_owner); for cookie sessions it is identical to
get_current_user, so the swap is a no-op for browser users. Route session
ownership/attribution in routes/session_routes.py through it.

Tests (tests/test_session_owner_attribution.py):
- cookie/browser users are unchanged
- a bearer token attributes to its owner; with no owner it does NOT escalate
- _verify_session_owner: a bearer token for owner A cannot verify owner B's
  session (404); owner verifies their own; missing -> 404; unauth -> 403
2026-06-02 11:39:01 +09:00
Mahdi Salmanzade
bc00a9fc7f fix(security): fail closed on null-owner session in sync-chat endpoint (#870)
POST /api/v1/chat (the n8n/Make/Activepieces sync-chat endpoint) verified
session ownership with `_tok_user and _sess_owner and _sess_owner != _tok_user`.
The `_sess_owner and` clause skipped the check entirely whenever the session's
owner was null — so any chat-scoped API token (e.g. a token minted for a paired
mobile device) could pass a legacy/migrated null-owner session id, inject a
message into that session, and read back its conversation history plus reuse
the owner's endpoint credentials.

This is the same `if owner and owner != user` null-owner-bypass pattern that
was already hardened in the gallery, calendar, and notes routes (see
test_null_owner_gates.py) and in session_routes._verify_session_owner. Make
this gate strict and fail closed too: require a resolvable caller and an exact
owner match, mirroring _verify_session_owner. Extract the decision into
_caller_owns_session() and pin it with regression tests.
2026-06-02 11:38:05 +09:00
James Arslan
6776c7d691 Surface silent model fallback instead of masking it (#868)
When the selected model fails before producing output, stream_llm_with_fallback
quietly switches to the next candidate and the reply is shown under the
originally selected model's name, so a misconfigured provider looks like it
works. (Concretely: a Bedrock gateway that 400s every Anthropic/Claude request
appears fine because another model silently answers under the Claude label.)

Emit a `fallback` SSE event ({selected_model, answered_by, reason}) the first
time a non-primary candidate produces output, forward it through the agent loop
and both chat-route paths, stamp the response metrics with the model that
actually answered, and show a notice + relabel the reply in the UI.

Tested: python -m pytest tests/test_llm_core_fallback.py (3 pass);
python -m py_compile src/llm_core.py src/agent_loop.py routes/chat_routes.py;
node --check static/js/chat.js.
2026-06-02 11:37:25 +09:00
Tatlatat
2d6b777799 fix(cookbook): diagnose 'no GGUF file' serve failures clearly (#811) (#866)
When serving with the llama.cpp backend and no .gguf file exists on the host,
the GGUF launcher prelude exits with 'ERROR: No GGUF found on this host', but
_diagnose_serve_output had no matching pattern, so the UI showed a generic
crash instead of explaining the cause. Add a diagnosis pattern for the
no-GGUF case so users are told a .gguf is required and pointed at downloading
a GGUF build, instead of an opaque crash.

Closes #811
2026-06-02 11:36:53 +09:00
Ernest Hysa
360bc83a66 fix(history): scope topic analysis to authenticated owner only (#744)
Two changes close the cross-tenant topic leak in /api/conversations/topics.

The route at routes/history_routes.py:478 used get_current_user, which
returns None when no auth middleware has set request.state.current_user
(loopback-bypass, AUTH_ENABLED=false, or any path that short-circuits the
middleware). It then forwarded owner=None to analyze_topics.

The helper at src/topic_analyzer.py:21 used an 'if owner:' short-circuit
in its owner filter, so the None owner took the no-filter path and the
helper silently aggregated topic frequencies and per-snippet session_id,
session_name, role, and snippet text across every user's sessions.

analyze_topics now returns an empty result when owner is falsy. The
inner short-circuit is removed because the filter is now strict by
construction. The route is switched to require_user, which raises 401
when auth_manager.is_configured is True and the caller is anonymous,
matching the pattern used by calendar_routes, skills_routes, and other
authenticated routes.

The test test_history_topics_owner_scope.py was rewritten to drive the
real route through FastAPI's TestClient with a stub AuthMiddleware that
mirrors the loopback-bypass branch, and now asserts a strict 401 from
the route and an empty result from the helper. The previous version of
the test accepted either a 200-with-empty-topics or a 401; the strict
assertion means a future regression that drops the require_user wrapper
or re-adds the inner short-circuit is caught immediately.
2026-06-02 11:36:01 +09:00
hawktuahs
a2f6183c4a Fix cookbook pip installs in venvs (#723) 2026-06-02 11:31:59 +09:00
tanmayraut45
0e31c38be0 Support in-place endpoint updates and recover empty-model sessions (#786)
The "don't wipe endpoint_url/model on endpoint delete" half of #587 landed
in 6a78b02 (Fix endpoint model preservation for tasks). The three remaining
follow-up pieces from the original PR — flagged in the review on #786 —
are:

- routes/model_routes.py: toggle_model_endpoint (PATCH) now accepts
  api_key and base_url, so the admin UI can rotate a key or fix a typo'd
  URL without going through delete+recreate. base_url is normalized the
  same way the POST handler does (strip /models, /chat/completions,
  /completions, /v1/messages, then _normalize_base). Cache invalidation
  matches the POST/DELETE paths and the response includes base_url so the
  frontend can confirm what was saved.

- routes/chat_routes.py: new _recover_empty_session_model picks
  cached_models[0] from the endpoint that matches sess.endpoint_url and
  persists it onto the Session row before the LLM call goes out. Wired
  into both /api/chat and /api/chat_stream after the existing
  _clear_orphaned_session_endpoint guard, so the order is: drop
  truly-orphaned sessions first, then heal the "picker showed it, session
  never knew" case.

- routes/chat_routes.py: when recovery fails (no endpoint, no cached
  models) raise HTTP 400 with a clear message instead of letting
  model="" reach the upstream as 401/503.

Closes #587.
2026-06-02 11:26:38 +09:00
Tatlatat
63a947d246 fix(cookbook): mark zero-file HF downloads as failed instead of completed (#839) (#865)
A Cookbook download whose repo/quant selector matched no files (e.g. a
':Q4_K_M' tag that does not exist) printed 'Fetching 0 files' and was still
reported as a successful '✓ Downloaded' / completed task. Detect the
zero-file signature in the download snapshot and mark the task as an error
with a clear diagnosis (no matching files — check the repo or quant/filename
pattern) so users know nothing was actually downloaded. Normal multi-file
and fully-cached downloads (which print 'Fetching N files', N>0) are
unaffected.

Closes #839
2026-06-02 11:24:34 +09:00
Ernest Hysa
f4aef0dcf7 fix(skills): scope skill reads to caller owner (#777)
read_skill_md and read_skill_reference walk all skill files via
_iter_skill_files and return the first match by slug, regardless
of owner. In a multi-user deployment where two users have skills
with the same slug under different categories, a caller scoped
to owner='alice' can read Bob's skill content.

This is the same cross-tenant leak class as the update_skill /
delete_skill fix (PR #755, merged), but on the read path.

Changes:
- read_skill_md / read_skill_reference accept owner= param (default
  None = match ownerless only, matching the write-path convention).
- 7 callers updated: tool_implementations.py (view, view_ref, patch),
  builtin_actions.py (test_skills), skills_routes.py (audit, source,
  test routes).
- Tests: read scoping (alice reads hers, not bob's), positive update
  scoping (alice can mutate her own), ownerless-match default.
2026-06-02 11:21:27 +09:00
Mahdi Salmanzade
4a84a895a0 Keep reasoning (thinking) tokens out of the saved chat reply (#856)
Streamed deltas flagged thinking:true (reasoning-model traces) were being folded
into full_response and persisted as part of the assistant message, so saved
replies were polluted with the model's chain-of-thought. Forward those deltas to
the client (for a live thinking indicator) but exclude them from the accumulated
saved reply, in both chat and research-stream paths. Mirrors the existing rewrite
path's handling.
2026-06-02 11:17:41 +09:00