Commit Graph

176 Commits

Author SHA1 Message Date
red person
028a39b42c Fix local Cookbook dependency installs in venvs (#1082) 2026-06-02 22:39:02 +09:00
Afonso Coutinho
5b12bf3f55 fix: ICS export doesn't escape commas/semicolons in event fields (#1161)
* fix: escape SUMMARY/LOCATION per RFC 5545 in ICS export

* fix: escape commas/semicolons in ICS DESCRIPTION, not just newlines

* test: ICS export escapes commas, semicolons, backslashes, newlines
2026-06-02 22:36:12 +09:00
Afonso Coutinho
2e2da2aefe fix: extract_statistics drops large numbers and trailing % signs (#1153)
* fix: extract_statistics misses comma-less numbers and drops trailing %

* fix: same extract_statistics number/percent bug in services copy

* test: extract_statistics captures full numbers and percent signs
2026-06-02 22:35:30 +09:00
Afonso Coutinho
2b2943a7b7 fix: extract_quotes accepts mismatched opening/closing quotes (#1113)
* fix: only extract quotes whose closing quote matches the opening one

* fix: same mismatched-quote bug in the services search copy

* test: extract_quotes requires matching open/close quotes
2026-06-02 22:34:52 +09:00
ghreprimand
c075abce5d Search: consolidate core and provider implementations
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 21:02:26 +09:00
Leo
de92bbe47a Cookbook fit: steer consumer AMD to GGUF recommendations
* Cookbook fit: consumer-AMD GGUF recommendations + accurate estimates (core logic)

Split of #746 — the estimate/ranking MATH only, so it can be reviewed with tests
first (UI changes follow separately). Backend files only: no static/js here.

services/hwfit/fit.py, services/hwfit/hardware.py:
- Recommend GGUF/llama.cpp on consumer AMD (RDNA, gfx10/11/12) instead of
  formats that don't run on consumer Radeon — vLLM-only AWQ/GPTQ/FP8 AND
  vendor-specific NVFP4 (NVIDIA) / MLX (Apple). Datacenter Instinct (CDNA) and
  CUDA are left untouched.
- More accurate speed estimates across more GPUs (adds RDNA bandwidth data).
- Detect AMD/RDNA GPUs (gpu_family from rocminfo) so fit/serve can branch on it.

tests/test_hwfit_amd.py: AMD recommendation path, quant/bit matching, estimate
realism, gfx RDNA-vs-CDNA classification.

Rebased onto current main (analyze_model gained a scoring_use_case param there;
kept it). Vision detection intentionally NOT added here — main already ships a
"Vision" type filter + multimodal use-case handling; duplicating it was dropped.

Checks: py_compile clean; pytest tests/test_hwfit_amd.py + hwfit/serve suites
= 28 passed; full suite 0 new failures vs main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Tests: assert NVFP4/MLX/FP8 formats are filtered on consumer RDNA

Backs the #972 claim with an explicit regression: no NVIDIA NVFP4, Apple MLX,
or vLLM-only FP8/AWQ/GPTQ repos are recommended on a consumer Radeon, and guards
against vacuity by asserting such repos exist in the catalog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:01:42 +09:00
red person
fd89d098a1 Chat: use cached endpoint model ids before probing 2026-06-02 21:00:58 +09:00
red person
5029c8570e Chat: prefer active model for new desktop chats 2026-06-02 21:00:50 +09:00
ooovenenoso
bd2fa82c1e Cookbook: prefer ROCm for native llama.cpp bootstrap
Co-authored-by: Kevin <120500656+oooindefatigable@users.noreply.github.com>
2026-06-02 20:59:44 +09:00
SurprisedDuck
934bca9e48 Providers: omit temperature for OpenAI reasoning models
* fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5)

These models only accept the default temperature; sending any explicit
value (even 0.0) returns HTTP 400 "Only the default (1) value is
supported". This broke two paths:

- Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so
  a perfectly valid o3/gpt-5 endpoint is reported as failing in the
  Model Endpoints health check.
- Chat/stream payloads send temperature unconditionally, so a non-default
  temperature preset 400s on these models.

The code already special-cases the same model family for
max_completion_tokens, so this adds a sibling _restricts_temperature()
helper and omits the field for those models, letting the API use its
required default. gpt-4.5 is intentionally excluded (not a reasoning
model; accepts temperature normally).

Adds tests/test_llm_core_temperature.py covering the predicate and the
synchronous payload builder.

* fix: also omit temperature for reasoning models on the direct-POST paths

The first commit only covered llm_call/llm_call_async/stream_llm and the
endpoint probe. Email auto-summary, urgency-less spam classification, the
email reply-summary endpoint, and gallery vision tagging build their
OpenAI payloads inline and POST them directly (requests/httpx), bypassing
llm_core — so a reasoning model configured there would still 400 on the
temperature field. These sites already branch on _uses_max_completion_tokens,
so they're the same class; added the matching _restricts_temperature guard.

gallery_routes also gains the max_completion_tokens branch it was missing,
so gpt-5 vision tagging works end to end.

Note: email_pollers urgency scoring goes through llm_call_async and was
already covered.
2026-06-02 20:58:33 +09:00
red person
d0c925f6c8 Chat attachments: allow picker to choose any file type 2026-06-02 20:55:30 +09:00
ghreprimand
aa0a9e8b5a Search: align service content extraction
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:53:07 +09:00
ghreprimand
eddb9ce6db Search: align service provider guards
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:52:13 +09:00
Leo
6c15dc7d33 Chat metrics: surface backend generation speed
* Chat metrics: show backend's true generation t/s, not tokens÷wall-clock

The per-message tokens/sec read low and felt wrong because it was computed as
output_tokens / total_duration, where total_duration is wall-clock including
prefill, tool calls, and network — not pure decode time. llama.cpp already
reports the correct gen speed in its stream (timings.predicted_per_second), but
it was being dropped.

- llm_core.py: when parsing the OpenAI-compatible usage chunk, also read the
  sibling `timings` block llama.cpp includes — pass predicted_per_second through
  as gen_tps and prompt_per_second as prefill_tps on the usage event.
- agent_loop.py: capture backend_gen_tps/backend_prefill_tps from usage events;
  in _compute_final_metrics prefer backend_gen_tps over the wall-clock division
  when present (fall back to computed for cloud APIs that omit timings). Tag the
  result with tps_source ("backend" vs "computed") and surface prefill_tps.

Result: the displayed t/s now matches the model's real decode speed and is
stable regardless of prompt length (a long prefill no longer deflates it).

Checks: py_compile passes; verified extraction against a real llama.cpp final
chunk (gen 79 t/s surfaced vs the deflated wall-clock figure shown before).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Chat metrics: surface true t/s on the direct-chat path too

Follow-up to the gen-tps work: the non-agent direct-chat stream path in
chat_routes turned the raw `usage` event straight into a metrics event but only
copied token counts — it never set tokens_per_second or response_time. So simple
(non-tool) replies showed "Speed: n/a" / "Time: undefineds" and the chip fell
back to a bare token count ("27 tok") instead of t/s.

Map the usage event's gen_tps (llama.cpp timings.predicted_per_second, added in
the prior commit) into tokens_per_second here too, tag tps_source=backend, and
set response_time from wall-clock for the stats popup.

Checks: py_compile passes; verified llama.cpp emits usage+timings on the final
stream chunk (gen ~90 t/s) that this path consumes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Tests: backend gen/prefill t/s passthrough and preference

Cover the two pieces of the true-t/s metric so it can be reviewed on its own:
- stream_llm surfaces llama.cpp's timings.predicted_per_second /
  prompt_per_second as gen_tps / prefill_tps on the usage event (captured
  llama.cpp final-chunk fixture), and omits them when the backend reports no
  timings.
- _compute_final_metrics prefers backend_gen_tps over output/wall-clock,
  tags tps_source ("backend" vs "computed"), and surfaces prefill_tps.

Reuses the fake-client stream harness from test_llm_core_streaming.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 20:52:08 +09:00
ghreprimand
4cec31d988 Chat: route image sessions only to matching image endpoints
Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>
2026-06-02 20:52:03 +09:00
Ernest Hysa
064c1ace91 Uploads: write uploads index atomically
* fix(upload): atomic-rename writes for uploads.json + .bak recovery

UploadHandler.save_upload does a read-modify-write of uploads.json via
two open(..., 'w') + json.dump blocks, with no lock, no temp+rename, and
no recovery. N concurrent inserts lost N-1 entries (last writer wins
after the read snapshot is taken); a SIGKILL/SIGTERM mid-json.dump
truncated the file and the bare 'except Exception: logger.warning(...)'
recovery path returned {}, silently dropping every prior upload.

The handler now serialises the RMW under a per-instance threading.Lock
and writes through _atomic_write_json, which writes to a tempfile in
the same directory, fsyncs, snapshots the previous live to .bak, and
renames the temp onto the target via os.replace. os.replace is atomic
on POSIX, so a reader sees either the old or the new state, never a
half-written file. _load_upload_index tries the live file first, then
falls back to the .bak sibling if the live is corrupt.

Cross-process safety is still on the deployer: gunicorn workers on
the same uploads dir will race the lock, and the atomic-rename is the
kernel-level guarantee that prevents torn reads. If multi-worker
writes are expected, fcntl.flock around the rename is a follow-up;
single-worker and async deployments are correct as-is.

* fix(upload): reload uploads.json inside _index_lock on dedupe path

The duplicate-detection branch in save_upload() was reading uploads.json
*before* taking _index_lock, then writing that stale snapshot under the
lock. A duplicate upload racing with a new-entry insert could clobber
the new entry because the duplicate's snapshot predated the insert.

The new-entry branch already reloaded inside the lock; the duplicate
branch now does the same. It also re-resolves the storage key inside
the lock, because a concurrent insert can have changed the dict's keys.

If the entry has been cleaned up between the outer read and the inner
write, the function falls through to the fresh-insert path instead of
silently writing a stale row.

Boundary note: the _index_lock serialises writers within a single
Python process. Cross-process / multi-worker deployments still need
flock or a database; the inline comment is updated to make this
explicit. The atomic-rename write keeps the on-disk state consistent
but does not serialise writers across processes.

Tests:
- Existing concurrent-insert and partial-write-recovery tests still pass.
- New test_atomic_write_primitives_present_in_production_code asserts
  the production module has at least two 'with self._index_lock:' blocks
  (regression net for this fix).
- New smoke tests: normal upload, duplicate detection, info lookup
  after a backup-recovery scenario.
2026-06-02 20:51:39 +09:00
Shaw
db10c8d95b Sessions: allow deleting memory-only ghost sessions
A session that exists only in the in-memory SessionManager — never persisted,
or whose DB row was removed out-of-band — was listed by GET /api/sessions (the
list is built from the in-memory manager) but 404'd on every per-session
operation, so it could never be deleted.

Two causes, both fixed:

1. _verify_session_owner() only consulted the DB and raised 404 when no row
   existed. It now falls back to the in-memory session's owner when (and only
   when) a session_manager is supplied and the caller actually owns the ghost.
   The DB row stays authoritative when present, and a ghost owned by another
   user still 404s, so the ownership/security model is unchanged. The new
   parameter defaults to None, preserving behavior for all other callers.

2. SessionManager.delete_session() only removed the in-memory entry when a DB
   row was found, so memory-only ghosts survived. It now drops the in-memory
   copy regardless and reports success when either the DB row or the in-memory
   entry was removed.

Added tests/test_session_ghost_delete.py covering both layers, including the
cross-owner 404, the unauthenticated 403, DB-row-wins precedence, and backward
compatibility when no manager is passed.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:51:26 +09:00
mechramc
8e87d3002b Tasks: clean up queued cancellation state 2026-06-02 20:51:21 +09:00
SurprisedDuck
f975279b26 Notes: parse natural-language due dates on update
The 'add' action runs due_date through parse_due_for_user (natural
language like 'tomorrow at 9am', plus user-tz anchoring for naive ISO),
but 'update' stored the raw value verbatim. A reminder edited with
natural language was saved as an unparseable literal the frontend's
new Date() can't read, so it never fired. Route update's due_date
through the same parser as add.
2026-06-02 20:51:16 +09:00
mechramc
8efd7b3df6 Windows: improve Git Bash detection 2026-06-02 20:45:48 +09:00
red person
4709bb022e Windows: add Docker update script 2026-06-02 20:45:32 +09:00
Tatlatat
7f97ab3032 Topics: hydrate session history before analysis
analyze_topics() iterates session_manager.sessions and reads
session_data.get("history", []) directly. But SessionManager.load_sessions
seeds sessions metadata-only with empty history — messages are loaded
lazily, only when get_session(session_id) is called. So analyze_topics saw
empty history for every session that hadn't been individually opened this
process lifetime and reported total_topics: 0, even when the database held
plenty of matching messages.

Hydrate each candidate session via session_manager.get_session(session_id)
(the existing lazy-load path) before reading its history, after the
owner/archived filters so skipped sessions aren't loaded. Falls back to the
raw cached history when the manager has no get_session (test stubs).

tests/test_topic_analyzer.py: new test_topic_analyzer_hydrates_sessions
seeds a real SQLite DB with a session + message, runs the real
SessionManager (asserting cached history starts empty), then asserts
analyze_topics finds the topic. Fails before this change. The existing
keyword tests now pass an explicit owner to satisfy the owner-required
early return.
2026-06-02 20:44:27 +09:00
SurprisedDuck
d73c0a13f4 YouTube: enforce comment fetch timeout while waiting
asyncio.wait_for wrapped create_subprocess_exec, which returns as soon
as the child is spawned, so the timeout never bounded the actual work.
yt-dlp could hang indefinitely on proc.communicate() and the
except asyncio.TimeoutError branch was unreachable. Bind the wait to
communicate() and kill/reap the child if it overruns.
2026-06-02 20:44:24 +09:00
Tatlatat
e084dc993e Chat: merge consecutive user messages for strict providers
After a non-native tool round, the agent appends tool results as a {role:
'user'} message next to the user's original 'user' prompt, producing two
consecutive 'user' messages. Strict provider APIs (Anthropic/Claude) reject
consecutive same-role messages, so the follow-up generation request fails
silently — search returns sources, then nothing is generated.

_sanitize_llm_messages now merges consecutive 'user' messages (joining their
content). Only user/user is merged; normal chat and agent/tool turns already
alternate and are untouched.

Scoped down per maintainer review: the agent_loop 'output' source-extraction
change is already on main (#898/#901) and the broad-mocking web-sources test
was dropped. Added a focused test that runs consecutive-user messages through
the real _build_anthropic_payload and asserts the payload alternates correctly.
2026-06-02 20:44:13 +09:00
Tatlatat
51cf63009e TTS: include mp3 files in cache stats
TTSService._put_cache writes .mp3 for MP3 audio (ID3/MPEG-framed bytes) and
.wav otherwise, and the rest of the class treats both as cache entries
(_get_cache iterates (".mp3", ".wav"); eviction globs "*.*"). But
get_stats() enumerated the cache with `glob("*.wav")` only, so both
cache_entries and cache_size_mb undercounted — reporting 0 whenever the
cache held MP3 files, which is the common case for most TTS providers.

Glob both extensions so the reported stats match what's actually cached.

tests/test_tts_cache_stats.py writes an MP3-headed blob via _put_cache and
asserts get_stats() reports one entry with non-zero size. Fails before this
change.
2026-06-02 20:43:29 +09:00
Tatlatat
3885f9fa90 STT: clean temp audio files on transcription failure
STTService._transcribe_local writes the audio to a NamedTemporaryFile
(delete=False) and only unlinks it on the success path, before the except.
If model.transcribe() raises (corrupt audio, model/runtime error, etc.) the
function logs, returns None, and leaves the .webm temp file behind — so
every failed local transcription leaks a file in the system temp dir.

Initialize tmp_path = None up front and move the unlink into a finally
block so the temp file is cleaned up whether transcription succeeds or
raises.

tests/test_stt_leak.py stubs the whisper model to raise during transcribe,
runs _transcribe_local, and asserts it returns None and leaves no new .webm
file in the temp dir. Fails before this change.
2026-06-02 20:43:24 +09:00
Collin
f8e3bfeaff Add endpoint probing behavior tests
ROADMAP "Backend → more tests around endpoint probing and provider setup".
TestSetupProbeSafety already covers _probe_endpoint's keyed/unkeyed curated
fallback; this adds the rest of the probe surface, with httpx faked the same
way (no network):

- _probe_endpoint: OpenAI {"data"} vs native Ollama {"models"} list parsing,
  the /api/tags fallback for Ollama builds lacking /v1/models, and the
  no-models-found result.
- _ping_endpoint (previously untested): 2xx reachable, auth failure (reached
  but not reachable), the /login-redirect "that's Odysseus, not a model
  server" trap, generic redirects, transport errors, and the native Ollama
  /api/version fallback.
- _probe_single_model (previously untested): ok/fail/timeout status mapping,
  dict/string upstream error extraction, and OpenAI vs Anthropic request
  routing (x-api-key, /v1/messages, tool schema).
- _classify_endpoint: the Tailscale CGNAT 100.64.0.0/10 local range and its
  boundaries.
2026-06-02 20:42:48 +09:00
Collin
e8dea7d456 Add provider classification and upstream-error tests
ROADMAP "Backend → more tests around endpoint probing and provider setup"
and the "Provider setup/probing audit" item. test_provider_endpoints.py
covers URL/header building; this adds the provider-identification and
degraded-state error reporting around it, against the real src.llm_core:

- _detect_provider: host-based (not substring) provider matching, with
  look-alike-host and domain-in-path guards, and the OpenAI-compatible
  fallback that xAI / DeepSeek / Gemini correctly use.
- _provider_label: human names used in error messages (incl. native vs
  cloud Ollama and the generic local-endpoint case).
- _format_upstream_error: 401/403/404/429/5xx → provider-aware sentences,
  with JSON / string / plain-text / bytes body detail extraction.
- _uses_max_completion_tokens: gpt-5 / o-series detection (gpt-4o stays
  on plain max_tokens).
2026-06-02 20:42:43 +09:00
Alexandre Teixeira
2e961cee93 tests: cover calendar route owner gates 2026-06-02 20:42:37 +09:00
Alexandre Teixeira
033e7a8f0d tests: cover API token CRUD routes 2026-06-02 20:42:32 +09:00
Alexandre Teixeira
4bbffbfb05 tests: cover upload route owner gates 2026-06-02 20:42:26 +09:00
Alexandre Teixeira
6255852bef tests: cover cleanup owner scope 2026-06-02 20:42:21 +09:00
Alexandre Teixeira
ff8b9e9ab6 tests: cover research route owner gates 2026-06-02 20:42:15 +09:00
Mihail Filippov
d92d6b5e67 Add tests for open-signup endpoint 2026-06-02 20:42:10 +09:00
Yavor Ivanov
7cc8fdb2f5 Models: avoid hidden models in default fallback
Both get_default_chat and _recover_empty_session_model picked the
first model from cached_models[0] without checking hidden_models.
If the first cached model was hidden (e.g. minimax-m3), it was
returned as the default or used to repair empty session models,
even though the model list endpoints already filter hidden_models.

- Add _visible_models() helper that filters cached_models by
  hidden_models (mirrors the filtering in list_model_endpoints)
- Use _visible_models() in get_default_chat fallback (when no
  explicit default_model is saved)
- Use _visible_models() in _recover_empty_session_model (when
  repairing a session whose model field is empty before chat send)
- Add regression tests for hidden-model filtering in default chat
  resolution, and unit tests for _visible_models helper
2026-06-02 20:37:14 +09:00
Shaw
8115cb01a2 Models: allow API keys for local endpoints
Self-hosted endpoints on a LAN are sometimes protected by an API key. The admin
"Local" add/test form only sent base_url (+ model_type), so such an endpoint
could not be added — it just errored out — even though the backend
POST /api/model-endpoints and /model-endpoints/test already accept an optional
api_key form field (the cloud "API" form already uses it).

Adds an optional masked "API key" input (adm-epLocalApiKey) to the Local form
and wires it into the local Test and Add handlers, sending api_key only when
filled (an empty value is omitted so we never send a blank Bearer). The field
is cleared after a successful add, matching the cloud form.

Tested: tests/test_local_endpoint_api_key_js.py extracts the two click handlers
and runs them under node with mocked DOM/FormData/fetch, asserting api_key is
sent when the field is filled and omitted when blank, plus that the input
exists as a password field. `node --check static/js/admin.js` passes.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:36:54 +09:00
Tatlatat
dac64f20d9 Text: strip dangling think blocks after visible text
`strip_think` removes a dangling (unclosed) `<think>` block via
`_THINK_OPEN_RE`, but that pattern was anchored to the start of the string
(`^\s*<think>`). An unclosed `<think>` (or `<thinking>`) opener that
appears *after* any leading output was therefore only half-handled: the
stray tag itself was removed by `_THINK_TAG_RE`, but the reasoning content
following it leaked straight to the user.

  strip_think("Hello! <think> I am thinking.")       # -> "Hello! I am thinking."  (leak)
  strip_think("Sure.\n<think>\nLet me reconsider...") # -> leaks the reasoning

`strip_think` feeds user-facing output across research, email replies,
notes, and scheduled tasks, so this leaks chain-of-thought to end users.

Un-anchor `_THINK_OPEN_RE` so a dangling opener anywhere strips from the
opener to end of string, consistent with the existing start-of-string
behavior. Content before the opener, closed `<think>...</think>` blocks,
and tag-free text are all preserved.

tests/test_strip_think.py covers the mid-text leak (fails before this
change), start-anchored unclosed, closed blocks, no-tag passthrough,
content-before-opener, and mixed closed+unclosed. Full existing think
suite still passes.
2026-06-02 20:36:37 +09:00
Tatlatat
8ad436d25a DB: enable SQLite foreign key cascades
* fix(db): enable SQLite foreign keys so ondelete cascades actually fire

core/database.py declares DB-level FK actions throughout
(ondelete="CASCADE" / "SET NULL"), but SQLite disables foreign-key
enforcement per connection by default and the engine had no connect-event
listener turning it on. So every one of those ondelete actions was dead.

Concrete impact: cleanup_old_sessions() in src/cleanup_service.py removes
old sessions with a bulk `query(Session).delete()`, which bypasses the
ORM-level relationship cascade and relies solely on the DB-level
ondelete="CASCADE" on ChatMessage.session_id. With foreign keys off, the
messages are never deleted — they pile up as orphaned rows on every
cleanup cycle.

Add the standard SQLAlchemy connect listener issuing `PRAGMA
foreign_keys=ON`, guarded by `isinstance(conn, sqlite3.Connection)` so it
only affects SQLite and leaves other backends untouched.

tests/test_sqlite_foreign_keys.py inserts a Session + ChatMessage, deletes
the Session via bulk `query().delete()`, and asserts the ChatMessage is
cascade-deleted. Fails before this change (orphan remains).

* docs(db): clarify FK pragma scope per review; trim test comments

Address review feedback on the foreign_keys PRAGMA change:
- Note that the class-level connect listener fires for every Engine in the
  process and is a no-op on non-SQLite backends (isinstance guard).
- Warn near init_db() that FK enforcement is now global, so a migration
  that temporarily violates FK constraints must disable foreign_keys around
  that work.
- Drop the step-by-step narration comments from the regression test.

No behavior change.
2026-06-02 20:36:13 +09:00
Tatlatat
bd78e1d5c2 Admin: wipe gallery albums with images
The /api/admin/wipe/gallery branch deleted GalleryImage rows but left
every GalleryAlbum row behind (GalleryAlbum wasn't even imported). After
"wipe gallery" the user is left with orphaned, empty albums whose cover_id
points at now-deleted images — inconsistent with the other wipe branches,
which clear both parent and child tables.

Delete GalleryAlbum alongside GalleryImage and include both in the
returned count.

Adds tests/test_admin_wipe_gallery.py: seeds a real in-memory SQLite DB
with an album + image, runs the actual wipe handler, and asserts both
tables are emptied. Fails before this change (albums survive).
2026-06-02 20:35:57 +09:00
SurprisedDuck
62f06ab740 Docs: respect path boundary when clearing exclusions
add_directory cleared exclusions with a raw path.startswith(directory)
test, which also matched sibling directories sharing a name prefix —
adding /docs would silently un-exclude files under /docs2. Match the
directory itself or paths under it (directory + os.sep) instead.
2026-06-02 20:35:44 +09:00
SurprisedDuck
78747b56ca Documents: strip PDF marker without corrupting text
_process_pdf prepends "\n\n[PDF content]:" to extracted text, and two
call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:").
str.lstrip(chars) treats its argument as a *set of characters*, so it keeps
eating into the page text that follows the marker — e.g. a body starting
with "to the board" loses its leading "to" because 't'/'o' are in the
marker's character set. Replace both sites with a shared
strip_pdf_content_marker() helper that uses str.removeprefix.
2026-06-02 20:35:27 +09:00
Ernest Hysa
996a2027dd Cookbook: surface pip install failures in logs
_pip_install_fallback_chain silently discarded pip stderr via
2>/dev/null on every attempt. When pip failed (network error, venv
mismatch, disk full), the wrapper exited 0 and the Cookbook UI showed
the download as running — the silent-failure mode from #354.

Extract _pip_install_attempt() which wraps each pip invocation in a
bash -c subshell that captures output to a temp file, prints tail -5
on failure, cleans up, and exits with pip's real exit code. This
avoids the | tail pipefail masking (the first blocker on #363) while
surfacing the last 5 lines of pip output in the tmux log so users
can see what went wrong.

Both local wrapper and remote SSH runner use the same helper through
_pip_install_fallback_chain, so the fix is symmetric.
2026-06-02 20:34:52 +09:00
Hayk Arzumanyan
514050d098 Models: rewrite Docker loopback endpoints to host gateway
In Docker, a model-endpoint URL pointing at loopback (e.g. the LM Studio
default http://localhost:1234/v1) targets the Odysseus container itself, not
the host running the server, so the probe gets a connection error and the
endpoint is rejected with a misleading 'No models found for that provider/key'.
Rewrite loopback to host.docker.internal (which compose already maps to
host-gateway) for the probe and the saved URL, mirroring the existing Ollama
handling. Gated on actually being in a container with the gateway reachable, so
native installs and gateway-less deploys are untouched.

Fixes #25

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-02 20:34:40 +09:00
SurprisedDuck
4307cac966 Research: report empty search provider results clearly
Deep Research surfaced 'Error: unknown error' whenever every search
provider returned an empty result set without raising (e.g. SearXNG is
reachable but all its engines fail internally). _last_search_error was
only set on exceptions, so the empty-but-no-exception path left it unset
and the caller fell back to 'unknown error'.

Record an actionable reason on that path naming the providers that were
tried, so users can tell it's a search-backend problem rather than a
model problem. The provider-raised path is unchanged.

Re: #344.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:34:25 +09:00
Tatlatat
67517eaed1 Gallery: match image endpoint URLs with exact v1 suffix
The image-edit endpoint lookup compared stored vs incoming base URLs with
`.rstrip("/v1")`. `str.rstrip(chars)` treats its argument as a character
set, not a suffix, so any URL ending in '/', 'v', or '1' is over-stripped
(e.g. `http://host1/v1` -> `http://host`). Two endpoints that are not the
same can then compare equal, or the real endpoint fails to match its own
stored record, leaving `api_key` unset and sending the upstream image call
unauthenticated.

Use `.removesuffix("/v1")` (exact-suffix removal) with surrounding
`.rstrip("/")` on both sides so only a genuine trailing `/v1` is dropped.

Adds a focused test that parses the actual comparison expression out of
gallery_routes.py via AST and evaluates it — it fails if the fix is
reverted and uses no mocking.
2026-06-02 20:34:05 +09:00
SurprisedDuck
d06b6d87d3 Models: prefer longest known context match
KNOWN_CONTEXT_WINDOWS lists 'o1' (200k) before 'o1-mini' (128k), and
_lookup_known returned on the first substring hit — so "o1-mini" matched
'o1' and reported 200000 instead of 128000. Track the longest matching
key instead, so the most specific entry wins regardless of table order.
2026-06-02 20:33:09 +09:00
mist
0b0be3c339 Email: recognize forwarded message dividers
`_ORIG_RE` (and its JS mirror `_TALON_ORIG_RE`) already recognised the
Japanese forward marker `転送` alongside the "Original Message" delimiters,
but not the English "Forwarded message" one. So Gmail-style forwards —
including the ones Odysseus itself emits (`---------- Forwarded message
----------`, static/js/emailInbox.js) — were not treated as a quote
boundary:

  - with a following Outlook From:/Date: header block, the divider line
    leaked into the level-0 reply bubble as noise;
  - with only the divider marking the forward (no header block), the body
    was not split into turns at all.

Add `Forwarded\s+message` to the same `[-_=]{3,}`-delimited alternation in
both the server-side parser and the JS mirror, so forward dividers are
consumed as an attribution boundary like "----- Original Message -----".
Locale variants of "Forwarded message" can follow the existing pattern.

Tests cover both manifestations plus a negative control (the bare words
"forwarded message" without `[-_=]{3,}` delimiters must not split).

Checks: python -m pytest tests/test_forwarded_message_divider.py (3 passed),
python -m py_compile src/email_thread_parser.py, node --check
static/js/emailLibrary/utils.js, git diff --check.
2026-06-02 20:32:56 +09:00
mist
e249fa4557 Tools: match keyword hints on word boundaries
`get_tools_for_query` force-includes whole tool families when the query
mentions an intent keyword, but matched with a raw substring test
(`kw in ql`). Short hints therefore fired inside unrelated words, bloating
the tool set with irrelevant tools:

  - "fix" matched "prefix"      -> document tools
  - "line" matched "deadline"/"online" -> document tools
  - "serve" matched "observe"/"reserve" -> cookbook serve tools
  - "reply" matched "replying"  -> all email tools
  - "unread" matched "unreadable" -> all email tools

Match each keyword on word boundaries instead
(`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to
the keyword matcher in topic_analyzer.py. Genuine intent keywords
("reply to this email", "edit the document", "serve the model") still match.

This only removes substring-inside-a-word matches; it does not change whole
-word matches (so e.g. an unrelated whole word like "tell" is a separate
keyword-choice question, left untouched here).

Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed;
3 of them fail on the pre-fix substring code), python -m py_compile
src/tool_index.py, git diff --check.
2026-06-02 20:32:20 +09:00
mist
8f0518c0ae Presets: fill missing built-in defaults on load
PresetManager.load already heals a forward-incompatible presets.json: the
block just above repairs the legacy `custom` shape and re-saves the file.
But if the file exists and is missing a whole built-in preset (e.g. an older
install written before `reason` existed), load returned it as-is, so that
built-in stayed permanently absent — silently missing from the picker that
GET /api/presets feeds, with no way for the user to get it back.

Extend the same self-heal: after the legacy migration, fill in any built-in
presets the loaded file is missing, defaults-first so user edits win, and
persist the result. This never clobbers an intentional removal — there is no
delete path for the built-in keys (only user_templates entries can be
deleted), and presets are hidden via an `enabled: False` flag, not removal.

Checks: python -m pytest tests/test_preset_fill_missing_defaults.py (3 passed;
2 fail on the pre-fix code), the existing preset cases in
tests/test_review_regressions.py still pass, python -m py_compile
src/preset_manager.py, git diff --check.
2026-06-02 20:32:08 +09:00
Mahdi Salmanzade
280c29d572 Security: owner-scope v1 chat endpoint fallback
The sync-chat endpoint's Case 3 fallback selected a ModelEndpoint with an
unscoped `query(ModelEndpoint).filter(is_enabled == True).first()` and then
used that row's decrypted `api_key` for the LLM call. ModelEndpoint is a
per-user resource (owner non-null = private to that user), so a chat-scoped
API token for user A that sent no session and no api_key could fall back onto
user B's PRIVATE endpoint — spending B's API key/quota and reaching whatever
internal base_url B configured. This is the same multi-tenant owner-scoping
class already fixed for the session gate on this very endpoint
(_caller_owns_session) and for companion/models.

Scope the fallback to the token owner's own rows plus legacy null-owner
(shared) rows via the existing owner_filter helper, matching
routes/model_routes.py and companion/routes.py. A null/empty owner stays a
no-op, preserving single-user/legacy behaviour.

Add regression tests pinning the scoped fallback (cross-owner, shared-only,
no-visible-row, disabled-owned, and the legacy null-owner no-op).
2026-06-02 20:31:35 +09:00