odysseus

Author	SHA1	Message	Date
red person	028a39b42c	Fix local Cookbook dependency installs in venvs (#1082 )	2026-06-02 22:39:02 +09:00
Afonso Coutinho	5b12bf3f55	fix: ICS export doesn't escape commas/semicolons in event fields (#1161 ) * fix: escape SUMMARY/LOCATION per RFC 5545 in ICS export * fix: escape commas/semicolons in ICS DESCRIPTION, not just newlines * test: ICS export escapes commas, semicolons, backslashes, newlines	2026-06-02 22:36:12 +09:00
red person	fd89d098a1	Chat: use cached endpoint model ids before probing	2026-06-02 21:00:58 +09:00
ooovenenoso	bd2fa82c1e	Cookbook: prefer ROCm for native llama.cpp bootstrap Co-authored-by: Kevin <120500656+oooindefatigable@users.noreply.github.com>	2026-06-02 20:59:44 +09:00
Robin Fröhlich	3c6ae3713e	Models: add Z.AI coding endpoint and GLM vision detection	2026-06-02 20:59:17 +09:00
SurprisedDuck	934bca9e48	Providers: omit temperature for OpenAI reasoning models * fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5) These models only accept the default temperature; sending any explicit value (even 0.0) returns HTTP 400 "Only the default (1) value is supported". This broke two paths: - Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so a perfectly valid o3/gpt-5 endpoint is reported as failing in the Model Endpoints health check. - Chat/stream payloads send temperature unconditionally, so a non-default temperature preset 400s on these models. The code already special-cases the same model family for max_completion_tokens, so this adds a sibling _restricts_temperature() helper and omits the field for those models, letting the API use its required default. gpt-4.5 is intentionally excluded (not a reasoning model; accepts temperature normally). Adds tests/test_llm_core_temperature.py covering the predicate and the synchronous payload builder. * fix: also omit temperature for reasoning models on the direct-POST paths The first commit only covered llm_call/llm_call_async/stream_llm and the endpoint probe. Email auto-summary, urgency-less spam classification, the email reply-summary endpoint, and gallery vision tagging build their OpenAI payloads inline and POST them directly (requests/httpx), bypassing llm_core — so a reasoning model configured there would still 400 on the temperature field. These sites already branch on _uses_max_completion_tokens, so they're the same class; added the matching _restricts_temperature guard. gallery_routes also gains the max_completion_tokens branch it was missing, so gpt-5 vision tagging works end to end. Note: email_pollers urgency scoring goes through llm_call_async and was already covered.	2026-06-02 20:58:33 +09:00
Tushar-Projects	c3228f8b59	Background tasks: respect active session model fallback	2026-06-02 20:57:42 +09:00
Georgiy	34c81e5b16	Auth: use require_user for remaining guarded routes	2026-06-02 20:55:50 +09:00
Leo	6c15dc7d33	Chat metrics: surface backend generation speed * Chat metrics: show backend's true generation t/s, not tokens÷wall-clock The per-message tokens/sec read low and felt wrong because it was computed as output_tokens / total_duration, where total_duration is wall-clock including prefill, tool calls, and network — not pure decode time. llama.cpp already reports the correct gen speed in its stream (timings.predicted_per_second), but it was being dropped. - llm_core.py: when parsing the OpenAI-compatible usage chunk, also read the sibling `timings` block llama.cpp includes — pass predicted_per_second through as gen_tps and prompt_per_second as prefill_tps on the usage event. - agent_loop.py: capture backend_gen_tps/backend_prefill_tps from usage events; in _compute_final_metrics prefer backend_gen_tps over the wall-clock division when present (fall back to computed for cloud APIs that omit timings). Tag the result with tps_source ("backend" vs "computed") and surface prefill_tps. Result: the displayed t/s now matches the model's real decode speed and is stable regardless of prompt length (a long prefill no longer deflates it). Checks: py_compile passes; verified extraction against a real llama.cpp final chunk (gen 79 t/s surfaced vs the deflated wall-clock figure shown before). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Chat metrics: surface true t/s on the direct-chat path too Follow-up to the gen-tps work: the non-agent direct-chat stream path in chat_routes turned the raw `usage` event straight into a metrics event but only copied token counts — it never set tokens_per_second or response_time. So simple (non-tool) replies showed "Speed: n/a" / "Time: undefineds" and the chip fell back to a bare token count ("27 tok") instead of t/s. Map the usage event's gen_tps (llama.cpp timings.predicted_per_second, added in the prior commit) into tokens_per_second here too, tag tps_source=backend, and set response_time from wall-clock for the stats popup. Checks: py_compile passes; verified llama.cpp emits usage+timings on the final stream chunk (gen ~90 t/s) that this path consumes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Tests: backend gen/prefill t/s passthrough and preference Cover the two pieces of the true-t/s metric so it can be reviewed on its own: - stream_llm surfaces llama.cpp's timings.predicted_per_second / prompt_per_second as gen_tps / prefill_tps on the usage event (captured llama.cpp final-chunk fixture), and omits them when the backend reports no timings. - _compute_final_metrics prefers backend_gen_tps over output/wall-clock, tags tps_source ("backend" vs "computed"), and surfaces prefill_tps. Reuses the fake-client stream harness from test_llm_core_streaming.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:52:08 +09:00
ghreprimand	4cec31d988	Chat: route image sessions only to matching image endpoints Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 20:52:03 +09:00
Shaw	db10c8d95b	Sessions: allow deleting memory-only ghost sessions A session that exists only in the in-memory SessionManager — never persisted, or whose DB row was removed out-of-band — was listed by GET /api/sessions (the list is built from the in-memory manager) but 404'd on every per-session operation, so it could never be deleted. Two causes, both fixed: 1. _verify_session_owner() only consulted the DB and raised 404 when no row existed. It now falls back to the in-memory session's owner when (and only when) a session_manager is supplied and the caller actually owns the ghost. The DB row stays authoritative when present, and a ghost owned by another user still 404s, so the ownership/security model is unchanged. The new parameter defaults to None, preserving behavior for all other callers. 2. SessionManager.delete_session() only removed the in-memory entry when a DB row was found, so memory-only ghosts survived. It now drops the in-memory copy regardless and reports success when either the DB row or the in-memory entry was removed. Added tests/test_session_ghost_delete.py covering both layers, including the cross-owner 404, the unauthenticated 403, DB-row-wins precedence, and backward compatibility when no manager is passed. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 20:51:26 +09:00
Yavor Ivanov	7cc8fdb2f5	Models: avoid hidden models in default fallback Both get_default_chat and _recover_empty_session_model picked the first model from cached_models[0] without checking hidden_models. If the first cached model was hidden (e.g. minimax-m3), it was returned as the default or used to repair empty session models, even though the model list endpoints already filter hidden_models. - Add _visible_models() helper that filters cached_models by hidden_models (mirrors the filtering in list_model_endpoints) - Use _visible_models() in get_default_chat fallback (when no explicit default_model is saved) - Use _visible_models() in _recover_empty_session_model (when repairing a session whose model field is empty before chat send) - Add regression tests for hidden-model filtering in default chat resolution, and unit tests for _visible_models helper	2026-06-02 20:37:14 +09:00
Tatlatat	bd78e1d5c2	Admin: wipe gallery albums with images The /api/admin/wipe/gallery branch deleted GalleryImage rows but left every GalleryAlbum row behind (GalleryAlbum wasn't even imported). After "wipe gallery" the user is left with orphaned, empty albums whose cover_id points at now-deleted images — inconsistent with the other wipe branches, which clear both parent and child tables. Delete GalleryAlbum alongside GalleryImage and include both in the returned count. Adds tests/test_admin_wipe_gallery.py: seeds a real in-memory SQLite DB with an album + image, runs the actual wipe handler, and asserts both tables are emptied. Fails before this change (albums survive).	2026-06-02 20:35:57 +09:00
SurprisedDuck	78747b56ca	Documents: strip PDF marker without corrupting text _process_pdf prepends "\n\n[PDF content]:" to extracted text, and two call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:"). str.lstrip(chars) treats its argument as a set of characters, so it keeps eating into the page text that follows the marker — e.g. a body starting with "to the board" loses its leading "to" because 't'/'o' are in the marker's character set. Replace both sites with a shared strip_pdf_content_marker() helper that uses str.removeprefix.	2026-06-02 20:35:27 +09:00
Ernest Hysa	996a2027dd	Cookbook: surface pip install failures in logs _pip_install_fallback_chain silently discarded pip stderr via 2>/dev/null on every attempt. When pip failed (network error, venv mismatch, disk full), the wrapper exited 0 and the Cookbook UI showed the download as running — the silent-failure mode from #354. Extract _pip_install_attempt() which wraps each pip invocation in a bash -c subshell that captures output to a temp file, prints tail -5 on failure, cleans up, and exits with pip's real exit code. This avoids the \| tail pipefail masking (the first blocker on #363) while surfacing the last 5 lines of pip output in the tmux log so users can see what went wrong. Both local wrapper and remote SSH runner use the same helper through _pip_install_fallback_chain, so the fix is symmetric.	2026-06-02 20:34:52 +09:00
Hayk Arzumanyan	514050d098	Models: rewrite Docker loopback endpoints to host gateway In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio default http://localhost:1234/v1) targets the Odysseus container itself, not the host running the server, so the probe gets a connection error and the endpoint is rejected with a misleading 'No models found for that provider/key'. Rewrite loopback to host.docker.internal (which compose already maps to host-gateway) for the probe and the saved URL, mirroring the existing Ollama handling. Gated on actually being in a container with the gateway reachable, so native installs and gateway-less deploys are untouched. Fixes #25 Co-authored-by: Claude <noreply@anthropic.com>	2026-06-02 20:34:40 +09:00
Tatlatat	67517eaed1	Gallery: match image endpoint URLs with exact v1 suffix The image-edit endpoint lookup compared stored vs incoming base URLs with `.rstrip("/v1")`. `str.rstrip(chars)` treats its argument as a character set, not a suffix, so any URL ending in '/', 'v', or '1' is over-stripped (e.g. `http://host1/v1` -> `http://host`). Two endpoints that are not the same can then compare equal, or the real endpoint fails to match its own stored record, leaving `api_key` unset and sending the upstream image call unauthenticated. Use `.removesuffix("/v1")` (exact-suffix removal) with surrounding `.rstrip("/")` on both sides so only a genuine trailing `/v1` is dropped. Adds a focused test that parses the actual comparison expression out of gallery_routes.py via AST and evaluates it — it fails if the fix is reverted and uses no mocking.	2026-06-02 20:34:05 +09:00
Mahdi Salmanzade	280c29d572	Security: owner-scope v1 chat endpoint fallback The sync-chat endpoint's Case 3 fallback selected a ModelEndpoint with an unscoped `query(ModelEndpoint).filter(is_enabled == True).first()` and then used that row's decrypted `api_key` for the LLM call. ModelEndpoint is a per-user resource (owner non-null = private to that user), so a chat-scoped API token for user A that sent no session and no api_key could fall back onto user B's PRIVATE endpoint — spending B's API key/quota and reaching whatever internal base_url B configured. This is the same multi-tenant owner-scoping class already fixed for the session gate on this very endpoint (_caller_owns_session) and for companion/models. Scope the fallback to the token owner's own rows plus legacy null-owner (shared) rows via the existing owner_filter helper, matching routes/model_routes.py and companion/routes.py. A null/empty owner stays a no-op, preserving single-user/legacy behaviour. Add regression tests pinning the scoped fallback (cross-owner, shared-only, no-visible-row, disabled-owned, and the legacy null-owner no-op).	2026-06-02 20:31:35 +09:00
Refuse	323f027865	Security: sanitize export and gallery filenames Co-authored-by: RefuseOdd <refuseodd@users.noreply.github.com>	2026-06-02 20:29:56 +09:00
mechramc	493c815371	Chat: scope active document fallbacks by owner	2026-06-02 20:29:27 +09:00
Tatlatat	cd247ed107	Skills: delete owner-scoped skills with owner The DELETE /api/skills/{skill_id} handler resolves the caller, loads the skill with skills_manager.load(owner=user), and verifies ownership with _verify_owner(match, user) — but then calls skills_manager.delete_skill(match.get("name")) without the owner. SkillsManager.delete_skill filters candidates with `(sk.owner or "") != (owner or "")`, so when owner is None an owner-scoped skill is skipped and the method returns False. The route then raises a spurious 404 "Skill not found" — meaning a logged-in user can never delete their own skills through the API. Pass the resolved owner through to delete_skill so the skill is matched and removed. tests/test_skills_delete_owner.py drops a real owner-scoped SKILL.md on disk and (1) checks the manager directly: delete_skill without owner returns False (regression lock) while delete_skill(owner="alice") returns True and removes the dir; (2) drives the real DELETE route handler and asserts it returns {"ok": True} and deletes the file. The route test fails before this change (404). Real SkillsManager + real filesystem, no mocking.	2026-06-02 20:28:36 +09:00
tanmayraut45	6c654fb0ef	Models: detect bare Ollama URLs as online _ping_endpoint() is the reachability fallback the model-endpoint POST handler invokes when _probe_endpoint() returns no model ids. It GETs base + "/models" and, on any sub-500 response, returns immediately with `reachable = (status < 400)`. That early return runs before the Ollama-native /api/version / /api/tags fallback below it. For an Ollama URL without /v1 (the quickstart accepts both http://localhost:11434 and http://127.0.0.1:11434, and the reporter on #1025 explicitly tried both), the OpenAI-style probe target is http://127.0.0.1:11434/models. Ollama returns 404 there because /models only lives under /v1. _ping_endpoint then returned reachable=False and the picker showed "Added (offline — will retry on next load)" on an install that was running fine. /api/version was never tried. Same shape for http://127.0.0.1:11434/api (the native Ollama root): /api/models is also 404, same premature offline verdict. _probe_endpoint() does fall through to /api/tags on a 4xx (the response raises via raise_for_status), so the endpoint quietly recovers once cached_models becomes non-empty on the next background refresh — matching the second commenter's "had to disconnect manually then reconnect for it to be detected" note. The bug is most visible while no models are pulled yet (cached_models stays empty, _ping_endpoint keeps voting offline). Fix: - Hoist the Ollama-shaped-URL test (port == 11434 or "ollama" in hostname — the same condition _probe_endpoint already uses) to the top of the function so both code paths share it. - Stop short-circuiting on 4xx when the URL looks like Ollama: fall through to the existing /api/version + /api/tags reachability loop so an alive Ollama gets recognised even when its OpenAI surface has the wrong prefix for the user's input. - Fix the `root` computation in that loop to strip a trailing /api as well as /v1, so http://127.0.0.1:11434/api no longer gets probed at /api/api/version. - 4xx on non-Ollama hosts keeps the current semantics: a 401 from api.openai.com/v1/models is still a definitive offline verdict, not a reason to GET /api/version on OpenAI. Closes #1025.	2026-06-02 20:27:41 +09:00
mechramc	9d0a18a5b5	Email: add explicit SMTP security mode	2026-06-02 13:15:06 +09:00
Juan Pablo Jiménez	eda99360d1	Fix Cookbook dependency install completion state * Fix Cookbook dependency install completion state Mark Cookbook dependency installs as complete when the background runner exits successfully, even when HuggingFace-specific download markers are absent. * Add focused regression coverage for cookbook dependency completion. Keep the fix narrowly scoped while carrying env_path through dependency tasks and locking the completion reconciliation behavior with targeted tests.	2026-06-02 12:59:29 +09:00
Mihail Filippov	3d109cbaca	Add explicit open-signup state endpoint * Refactor open registration state switching * Rename endpoint to open-signup	2026-06-02 12:35:54 +09:00
Leo	6fca7e86b7	Cookbook serve profiles and engine filter * Cookbook: Engine filter + intelligent hardware-computed serve profiles Two related Cookbook serving improvements for accurate, hardware-aware model serving (especially on consumer GPUs that can only run GGUF/llama.cpp). Engine filter - New "Engine" dropdown (All / llama.cpp / vLLM / SGLang) beside the quant picker. Pure client-side view filter over the fetched list via the same _detectBackend() the serve commands use, so what you filter to is exactly what would launch. Re-renders from cache (no refetch). Empty-state message + the instant-cache-paint path account for it too. Intelligent serve profiles (Quality / Balanced / Speed) - services/hwfit/profiles.py: compute_serve_profiles() turns detected VRAM + model size into concrete llama.cpp flags (n_gpu_layers, n_cpu_moe, cache-type, context). Encodes the by-hand tuning: a too-big MoE offloads experts to CPU instead of failing; a model that fits stays fully on GPU; quant tracks profile intent; vision models keep image-encoder headroom. Reuses models.py VRAM math so filtering and serving agree on what fits. Pure/deterministic (no t/s claims — partial-offload speed isn't reliably predictable; fit is what's computed). - /api/hwfit/profiles endpoint returns the profiles + the model's trained context limit, with loose name matching (strips org/ prefix, -GGUF suffix, quant tag) so a local GGUF folder name resolves to its catalog entry. - _buildServeCmd (llama.cpp) now emits --n-cpu-moe / --flash-attn / --cache-type-k/v when set, with llama-cpp-python fallback equivalents. It previously only set -ngl/-c, which is why it OOM'd or ran slow. - Serve panel: profile chips that fill the fields on click, plus CPU-MoE / KV Cache / Flash Attn fields. Context is clamped to the model's trained limit (and an absolute 1M sanity ceiling) on type/blur/profile-load and at launch — fixes a crash where a stale 256k/16M preset + quantized KV cache caused an amdgpu ErrorDeviceLost. Tests: tests/test_serve_profiles.py (7) — offload vs full-GPU fit, never exceed VRAM, context cap, launchable flags, vision headroom, no-GPU empty. Checks: py_compile + node --check pass; pytest test_serve_profiles + test_hwfit_amd green; verified live on an RDNA4 box (gfx1200) — Balanced lands ~ncm18 q4 128k, matching hand-tuning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook: make column-header sorting discoverable (incl. Newest) Sorting in Cookbook is via clickable column headers (pewds' design), but the headers had no visual cue that they're interactive — so sorting in general, and the Newest sort on the Model header specifically, was undiscoverable. - Style sortable headers as interactive: pointer cursor, hover underline, and the active sort column bolded/highlighted. There was no CSS for .hwfit-sortable / .hwfit-sort-active at all; this helps every existing sort, not just Newest. - The Model column header sorts by release_date (newest first), reusing the existing header-click sort wiring and the "newest" SORT_KEY. No new sort control — uses the existing column-header paradigm. Checks: node --check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve profiles: keep the on-disk file's quant fixed (don't propose Q6/Q2) In the Serve tab the model is a specific GGUF file already on disk, so its quant can't change — but the profiles were suggesting "Quality · Q6_K" / "Speed · Q2_K" as if you could re-quantize it. That's meaningless when serving a fixed file. - compute_serve_profiles gains serve_weights_gb / serve_quant. When set (SERVE mode), the quant is locked to the file's and profiles differ only in the real serving knobs — n_cpu_moe, KV-cache type, context. _weights_gb / _cpu_moe_for_budget use the file's actual size instead of a quant-derived estimate. DOWNLOAD mode (no override) still varies the quant to show download options. - /api/hwfit/profiles accepts serve_weights_gb & serve_quant. - The Serve panel parses the file's size (from m.size "20.6 GB") and quant (from the repo/file name) and passes them, so profiles match what's actually served. Result for a 20.6 GB Q4_K_M file: all three profiles stay Q4_K_M and differ by KV/ctx/offload (Quality q8 KV 128k ncm21, Balanced q4 128k ncm17, Speed q4 32k ncm15) — no nonsensical quant changes. Tests: test_serve_mode_keeps_fixed_quant. Full serve-profile suite green (9). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Cookbook serve: Vision toggle (auto-find mmproj) + live VRAM/RAM-spillover monitor Two serve-panel additions: 1. Vision toggle. A "Vision" checkbox that serves the model with its multimodal projector so it can read images. The mmproj path is resolved at runtime (find mmproj-.gguf next to the model), so dropping an mmproj file in the model folder makes the toggle just work; `--mmproj … --image-max-tokens 1024` (native) / `--clip_model_path` (llama-cpp-python) only when on + found. 2. Live GPU-memory monitor.* A readout that polls /api/cookbook/gpus every 4s while the panel is open and shows VRAM used/total/%, free, and — crucially on a discrete card — RAM spillover (AMD gtt_used_mb), with a plain-language health hint: green/healthy, amber/tight, red/"spilled to RAM — slow (raise CPU MoE or lower context)". Surfaces gtt_used_mb from the gpus endpoint (previously read for total only and discarded for 'used'). Lets you see at a glance whether a config fits VRAM (fast) or is paging to system RAM over PCIe (slow) instead of guessing. Checks: node --check + py_compile pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 12:34:42 +09:00
spooky	8b3c0d8ad4	feat: select cached gguf artifacts for serve (#891 )	2026-06-02 12:32:40 +09:00
Dustin	bd3204fe96	Diagnose vLLM device detection failure with actionable suggestion (#778 ) Adds a diagnosis pattern for the 'Failed to infer device type' error vLLM raises when no CUDA or ROCm GPU is found (e.g. systems with only integrated or Intel Xe graphics). The existing pattern only caught 'No CUDA GPUs are available' which fires later in startup; this new entry catches the earlier device-probe failure and the NVML/amdsmi library-not-found messages that precede it. Surfaces in the Cookbook serve card as: "vLLM could not find a supported GPU — switch to llama.cpp or Ollama" instead of a raw Python traceback. Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-06-02 12:30:07 +09:00
IBR-41379	385c3c3cf3	fix: use sys.executable for Cookbook model cache scan on Windows (#627 ) Windows has 'App Execution Aliases' that can make shutil.which('python3') and shutil.which('python') resolve to a Microsoft Store stub instead of real Python -- even when Python is properly installed. The stub outputs: 'Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Apps > Advanced app settings > App execution aliases.' and exits 9009, producing empty stdout. The JSON parse of the local model cache scan then fails with 'Expecting value: line 1 column 1 (char 0)', and the Cookbook model list shows nothing. Fix: prefer sys.executable as the interpreter for the local scan. Odysseus already runs inside its own venv, so sys.executable always points to the real venv Python and bypasses PATH / Store alias lookup entirely. which_tool() is kept as a fallback. Cross-platform: sys.executable works identically on Linux and macOS (returns the real interpreter path), so this change is safe everywhere.	2026-06-02 12:29:40 +09:00
Rolly Calma	32efeeb3a2	chore: use running event loop in async helpers (#821 )	2026-06-02 12:28:05 +09:00
lolwuttav	c99193041a	fix(cookbook): default Ollama serve to loopback (#872 )	2026-06-02 12:27:04 +09:00
Tatlatat	ffb77d7ff2	fix(auth): honor AUTH_ENABLED=false on owner-scoped endpoints (no /login loop) (#880 ) When the operator sets AUTH_ENABLED=false, three owner-scoped endpoints still returned 401 (api/models, api/research/, api/email/), so the front-end redirected the browser to /login and the app was unusable despite auth being turned off. require_user() in src/auth_helpers.py already documents and honors this contract (issue #622) via 'if _auth_disabled(): return ""', but these endpoints did their own get_current_user/is_configured check without it. Make _require_user (research), the /api/models anti-leak guard, and email_helpers._require_auth consult _auth_disabled() and let anonymous through (owner='') only when the operator explicitly disabled auth. The 401 protection is fully intact when AUTH_ENABLED=true. Verified end-to-end: with AUTH_ENABLED=false the SPA now loads instead of bouncing to /login.	2026-06-02 12:26:26 +09:00
Mahdi Salmanzade	66cd44b66d	fix(research): gate /api/research/spinoff on session ownership (#878 ) The spinoff endpoint authenticated the caller (_require_user) but never verified the research session belonged to them before reading the persisted report and seeding it into a new chat session owned by the caller. Any authenticated user who knew or guessed another user's research session ID could exfiltrate that user's full report into their own session — a cross-user data disclosure (IDOR). Every other endpoint in this router gates on _owns_in_memory / _assert_owns_research right after validating the session ID; spinoff was the lone exception. Add the same _owns_in_memory check (covers both the in-memory task and the on-disk JSON) so a non-owner gets a 404 before any data is read or a session is created. Add regression tests pinning the anonymous (401) and wrong-owner (404) cases.	2026-06-02 12:26:12 +09:00
Mahdi Salmanzade	f691537472	fix(security): stop leaking the vault master password via process argv (#879 ) The /api/vault/unlock handler ran `bw` as `_run_bw(["unlock", req.master_password, "--raw"])`. _run_bw launches it with `asyncio.create_subprocess_exec(bw_path, *args)`, so the master password became a process argument — readable by any local user through `ps` and `/proc/<pid>/cmdline` for the lifetime of the unlock subprocess. The Bitwarden master password decrypts the entire vault, so this is a serious credential exposure on any multi-user / shared host (CWE-214). The sibling /login handler already avoids this by feeding the password on stdin; unlock was the outlier. Hand the password to `bw` through the environment instead (`--passwordenv BW_PASSWORD`), mirroring how BW_SESSION is already passed — `/proc/<pid>/environ` is readable only by the process owner, not other local users. Add regression tests pinning that the secret reaches the subprocess env and never appears in argv.	2026-06-02 12:25:43 +09:00
Juan Pablo Jiménez	e58e4a185d	Expose Cookbook user-install CLIs in Docker (#887 ) Ensure pip --user console scripts like vLLM are visible to Docker runtime and dependency probes by adding the user install bin directory to PATH.	2026-06-02 12:23:29 +09:00
Tatlatat	9a1893760d	fix(cookbook): skip pip --user fallback inside virtualenvs (#388 ) (#889 ) The dependency-install fallback chain unconditionally ran 'pip install --user', which fails inside a virtualenv (and as root in LXC/containers) with 'Can not perform a --user install. User site-packages are not visible in this virtualenv.' — even though the function's docstring already noted --user is invalid in venvs. Guard the --user fallback with a venv check so it only runs outside a venv (where --user is actually valid for PEP-668 system Pythons). Derive the venv probe interpreter from the install command (python for 'pip', python3 for 'pip3'/'python3 -m pip') so the check runs in pip's own environment. System PEP-668 installs keep the --user fallback; venv/LXC-root installs no longer hit the --user error. Updated the unit test for the new chain. Closes #388	2026-06-02 12:23:20 +09:00
pewdiepie-archdaemon	966b53df77	Improve Cookbook serve diagnostics and recommendations	2026-06-02 12:15:47 +09:00
NovaUnboundAi	3319310942	Allow longer deep research extraction timeouts (#651 ) Co-authored-by: NovaUnboundAi <NovaUnboundAi@users.noreply.github.com>	2026-06-02 11:50:03 +09:00
Rasmus	e73f3edc06	fix: scope chat active-document lookup to the session owner (#569 )	2026-06-02 11:46:40 +09:00
elijaheck	c303a29670	Fix native macOS tailnet launch and Metal GPU probe (#756 ) * macOS/Apple Silicon: detect Metal backend, surface MLX models, brew tmux hint - hardware.py: add _detect_macos() via sysctl/system_profiler; report backend=metal + unified_memory on Apple Silicon instead of cpu_arm - fit.py: add Apple Silicon (M1-M5) unified-memory bandwidths + metal FALLBACK_K so throughput estimates use the real bandwidth formula - setup.py: Mac-specific 'brew install tmux' hint Verified on M5 Pro 48GB: backend=metal, 273GB/s matched, 6 MLX models now visible (were hidden), cuda still hides MLX, no new test failures. * Fix native macOS tailnet launch and Metal GPU probe --------- Co-authored-by: Elijah (Hermes) <hermes@local>	2026-06-02 11:41:04 +09:00
Mahdi Salmanzade	54ac4a74fb	Attribute API-token sessions to the token owner (effective_user) (#871 ) Split 2/4 of the companion bridge (#863 was 1/4). A paired bearer-token caller runs as the sandboxed 'api' pseudo-user, so its sessions were stranded in a separate 'api'-owned silo, invisible to the owner's desktop UI. Add effective_user(): for a bearer token it resolves to the token's real owner (request.state.api_token_owner); for cookie sessions it is identical to get_current_user, so the swap is a no-op for browser users. Route session ownership/attribution in routes/session_routes.py through it. Tests (tests/test_session_owner_attribution.py): - cookie/browser users are unchanged - a bearer token attributes to its owner; with no owner it does NOT escalate - _verify_session_owner: a bearer token for owner A cannot verify owner B's session (404); owner verifies their own; missing -> 404; unauth -> 403	2026-06-02 11:39:01 +09:00
Mahdi Salmanzade	bc00a9fc7f	fix(security): fail closed on null-owner session in sync-chat endpoint (#870 ) POST /api/v1/chat (the n8n/Make/Activepieces sync-chat endpoint) verified session ownership with `_tok_user and _sess_owner and _sess_owner != _tok_user`. The `_sess_owner and` clause skipped the check entirely whenever the session's owner was null — so any chat-scoped API token (e.g. a token minted for a paired mobile device) could pass a legacy/migrated null-owner session id, inject a message into that session, and read back its conversation history plus reuse the owner's endpoint credentials. This is the same `if owner and owner != user` null-owner-bypass pattern that was already hardened in the gallery, calendar, and notes routes (see test_null_owner_gates.py) and in session_routes._verify_session_owner. Make this gate strict and fail closed too: require a resolvable caller and an exact owner match, mirroring _verify_session_owner. Extract the decision into _caller_owns_session() and pin it with regression tests.	2026-06-02 11:38:05 +09:00
James Arslan	6776c7d691	Surface silent model fallback instead of masking it (#868 ) When the selected model fails before producing output, stream_llm_with_fallback quietly switches to the next candidate and the reply is shown under the originally selected model's name, so a misconfigured provider looks like it works. (Concretely: a Bedrock gateway that 400s every Anthropic/Claude request appears fine because another model silently answers under the Claude label.) Emit a `fallback` SSE event ({selected_model, answered_by, reason}) the first time a non-primary candidate produces output, forward it through the agent loop and both chat-route paths, stamp the response metrics with the model that actually answered, and show a notice + relabel the reply in the UI. Tested: python -m pytest tests/test_llm_core_fallback.py (3 pass); python -m py_compile src/llm_core.py src/agent_loop.py routes/chat_routes.py; node --check static/js/chat.js.	2026-06-02 11:37:25 +09:00
Tatlatat	2d6b777799	fix(cookbook): diagnose 'no GGUF file' serve failures clearly (#811 ) (#866 ) When serving with the llama.cpp backend and no .gguf file exists on the host, the GGUF launcher prelude exits with 'ERROR: No GGUF found on this host', but _diagnose_serve_output had no matching pattern, so the UI showed a generic crash instead of explaining the cause. Add a diagnosis pattern for the no-GGUF case so users are told a .gguf is required and pointed at downloading a GGUF build, instead of an opaque crash. Closes #811	2026-06-02 11:36:53 +09:00
Ernest Hysa	360bc83a66	fix(history): scope topic analysis to authenticated owner only (#744 ) Two changes close the cross-tenant topic leak in /api/conversations/topics. The route at routes/history_routes.py:478 used get_current_user, which returns None when no auth middleware has set request.state.current_user (loopback-bypass, AUTH_ENABLED=false, or any path that short-circuits the middleware). It then forwarded owner=None to analyze_topics. The helper at src/topic_analyzer.py:21 used an 'if owner:' short-circuit in its owner filter, so the None owner took the no-filter path and the helper silently aggregated topic frequencies and per-snippet session_id, session_name, role, and snippet text across every user's sessions. analyze_topics now returns an empty result when owner is falsy. The inner short-circuit is removed because the filter is now strict by construction. The route is switched to require_user, which raises 401 when auth_manager.is_configured is True and the caller is anonymous, matching the pattern used by calendar_routes, skills_routes, and other authenticated routes. The test test_history_topics_owner_scope.py was rewritten to drive the real route through FastAPI's TestClient with a stub AuthMiddleware that mirrors the loopback-bypass branch, and now asserts a strict 401 from the route and an empty result from the helper. The previous version of the test accepted either a 200-with-empty-topics or a 401; the strict assertion means a future regression that drops the require_user wrapper or re-adds the inner short-circuit is caught immediately.	2026-06-02 11:36:01 +09:00
hawktuahs	a2f6183c4a	Fix cookbook pip installs in venvs (#723 )	2026-06-02 11:31:59 +09:00
tanmayraut45	0e31c38be0	Support in-place endpoint updates and recover empty-model sessions (#786 ) The "don't wipe endpoint_url/model on endpoint delete" half of #587 landed in `6a78b02` (Fix endpoint model preservation for tasks). The three remaining follow-up pieces from the original PR — flagged in the review on #786 — are: - routes/model_routes.py: toggle_model_endpoint (PATCH) now accepts api_key and base_url, so the admin UI can rotate a key or fix a typo'd URL without going through delete+recreate. base_url is normalized the same way the POST handler does (strip /models, /chat/completions, /completions, /v1/messages, then _normalize_base). Cache invalidation matches the POST/DELETE paths and the response includes base_url so the frontend can confirm what was saved. - routes/chat_routes.py: new _recover_empty_session_model picks cached_models[0] from the endpoint that matches sess.endpoint_url and persists it onto the Session row before the LLM call goes out. Wired into both /api/chat and /api/chat_stream after the existing _clear_orphaned_session_endpoint guard, so the order is: drop truly-orphaned sessions first, then heal the "picker showed it, session never knew" case. - routes/chat_routes.py: when recovery fails (no endpoint, no cached models) raise HTTP 400 with a clear message instead of letting model="" reach the upstream as 401/503. Closes #587.	2026-06-02 11:26:38 +09:00
Tatlatat	63a947d246	fix(cookbook): mark zero-file HF downloads as failed instead of completed (#839 ) (#865 ) A Cookbook download whose repo/quant selector matched no files (e.g. a ':Q4_K_M' tag that does not exist) printed 'Fetching 0 files' and was still reported as a successful '✓ Downloaded' / completed task. Detect the zero-file signature in the download snapshot and mark the task as an error with a clear diagnosis (no matching files — check the repo or quant/filename pattern) so users know nothing was actually downloaded. Normal multi-file and fully-cached downloads (which print 'Fetching N files', N>0) are unaffected. Closes #839	2026-06-02 11:24:34 +09:00
Ernest Hysa	f4aef0dcf7	fix(skills): scope skill reads to caller owner (#777 ) read_skill_md and read_skill_reference walk all skill files via _iter_skill_files and return the first match by slug, regardless of owner. In a multi-user deployment where two users have skills with the same slug under different categories, a caller scoped to owner='alice' can read Bob's skill content. This is the same cross-tenant leak class as the update_skill / delete_skill fix (PR #755, merged), but on the read path. Changes: - read_skill_md / read_skill_reference accept owner= param (default None = match ownerless only, matching the write-path convention). - 7 callers updated: tool_implementations.py (view, view_ref, patch), builtin_actions.py (test_skills), skills_routes.py (audit, source, test routes). - Tests: read scoping (alice reads hers, not bob's), positive update scoping (alice can mutate her own), ownerless-match default.	2026-06-02 11:21:27 +09:00
Mahdi Salmanzade	4a84a895a0	Keep reasoning (thinking) tokens out of the saved chat reply (#856 ) Streamed deltas flagged thinking:true (reasoning-model traces) were being folded into full_response and persisted as part of the assistant message, so saved replies were polluted with the model's chain-of-thought. Forward those deltas to the client (for a live thinking indicator) but exclude them from the accumulated saved reply, in both chat and research-stream paths. Mirrors the existing rewrite path's handling.	2026-06-02 11:17:41 +09:00

1 2 3

109 Commits