odysseus

Author	SHA1	Message	Date
lekt8	87babb58d5	fix: SSRF hardening for the custom embedding endpoint URL (#132 ) (#1206 ) POST /api/embeddings/endpoint takes a user-supplied URL and immediately makes an outbound httpx request to it with no validation. The admin gate added earlier (PR #80) closed the unauthenticated-access part of #132; this addresses the remaining request: validate the URL before fetching it. Odysseus is local-first, so pointing the embedding endpoint at a loopback or LAN server (local vLLM / llama.cpp / Ollama) is a normal setup — a blanket private-IP block would break the primary use case. So the guard: - always rejects non-HTTP(S) schemes (file://, gopher://, ftp:// …), - always rejects the link-local range (169.254.0.0/16, incl. the cloud instance-metadata 169.254.169.254 exfil vector) plus multicast / reserved / unspecified, and IPv4-mapped-IPv6 forms of the above, - keeps loopback/LAN allowed by default, and - adds EMBEDDING_BLOCK_PRIVATE_IPS=true for full SSRF lockdown on exposed multi-tenant deployments. Logic lives in src/url_safety.py (stdlib only, resolver injectable) so it is unit-testable without real DNS; the route calls it before the health-check request. Covered by tests/test_url_safety.py (8 cases). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 23:46:33 +09:00
lekt8	f2f437f4a8	feat: add /api/ready readiness probe (DB, data dir, local-first) (#1200 ) /api/health is a liveness ping. This adds /api/ready as a readiness / integrity self-check that returns 503 unless every critical subsystem is whole, so an orchestrator (Docker/Compose/k8s) can gate traffic on real readiness rather than mere process liveness: - database: opens a connection and runs SELECT 1 - data_dir: confirms the data directory exists and is writable - local_first: reports whether storage stays on the host (informational; a remote database is a valid deployment, so it never fails readiness) The check logic lives in src/readiness.py so it is unit-testable in isolation; the route is a thin wrapper. Covered by tests/test_readiness.py. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 23:33:22 +09:00
Mayank Ukey	f96edfe5ca	fix: deepseek-r1 on Ollama returns HTTP 400 when tool schemas are sent (#1169 ) * fix: exclude deepseek from local tool-calling keyword list deepseek-r1 on Ollama returns HTTP 400 when tool schemas are sent. The cloud API (api.deepseek.com) is already caught by the _API_HOSTS check, so the generic 'deepseek' keyword match was only causing false positives for local Ollama-served models. * fix: add model no-tools blocklist and regression tests for deepseek-r1 The previous fix removed 'deepseek' from the keyword allow-list, but _is_api_model is still True for localhost endpoints because 'localhost' appears in _API_HOSTS — so the keyword change had no effect for Ollama. Proper fix: add an explicit _model_no_tools blocklist ('deepseek-r1') that overrides the endpoint URL check. The endpoint's supports_tools DB flag still takes priority either way (True forces tools on, False forces them off), so users can override per-endpoint when needed. Also refined the deepseek allow-list: 'deepseek-v' and 'deepseek-chat' cover the cloud models (v2, v3, chat) that do support tools, without matching deepseek-r1 variants. 13 regression tests cover: - deepseek-r1 on localhost/docker: no tools (was HTTP 400) - deepseek-v3/chat on api.deepseek.com: tools enabled (no regression) - endpoint_supports=True/False overrides both lists - qwen/llama on localhost: unaffected	2026-06-02 23:22:57 +09:00
Ernest Hysa	c12ae79c42	fix(tools): strict path confinement with sensitive-subpath deny list (#1072 ) Rework read_file / write_file confinement after review feedback: - Remove $HOME from default allow roots. Only project data/ and system temp dirs are allowed out of the box. - Add a sensitive-subpath deny list (.ssh, .gnupg, shell rc files, .env, .netrc, SSH key filenames). Checked BEFORE allowlist so it blocks even when a broader root is configured. - Add "tool_path_extra_roots" setting for opt-in broader access. - Sensitive subpaths remain blocked regardless of configured roots. Tests: 24 cases covering /etc/shadow, ~/.ssh/authorized_keys, symlink into .ssh, traversal, shell rc files, key filenames, extra roots, and dispatch-level end-to-end.	2026-06-02 23:13:30 +09:00
RosenTomov	37356d8e3e	Discover LM Studio via host/port scanning and native-API fingerprint (#1126 ) Scan port 1234 and any custom port from LM_STUDIO_URL, add the LM_STUDIO_URL host to the discovery sweep alongside the Ollama env vars, and tag each discovered endpoint with its provider by fingerprinting the native /api/v1/models response (entries carrying key + architecture). Documents LM_STUDIO_URL in .env.example.	2026-06-02 23:04:58 +09:00
Jordan Urbs	c0c1ceb36d	Treat Venice as a tool-capable SOTA cloud provider (#1173 ) Follow-up to the Venice provider PR. Wire api.venice.ai into the three host allowlists so Venice behaves like the other paid OpenAI-compatible clouds: - agent_loop: add api.venice.ai to _API_HOSTS so the agent sends native OpenAI tool-call schemas (Venice supports function calling) instead of degrading to fenced-block parsing. - teacher_escalation: add api.venice.ai to _SOTA_HOSTS so the escalation loop stays OFF for Venice (it's a paid top-tier API; no need to add teacher-model latency). - webhook_routes: add venice to KNOWN_PROVIDERS so the sync chat webhook can auto-resolve base_url from provider=venice. Tests: tests/test_venice_hosts.py pins tool-host matching + SOTA classification for Venice; py_compile on touched modules. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-02 23:03:46 +09:00
RosenTomov	a493fb49b0	Use LM Studio-reported vision capability for image passthrough (#1130 ) Read a model's capabilities.vision flag from LM Studio's native /api/v1/models so vision finetunes whose names lack a vision keyword still receive images, falling back to the name heuristic when the endpoint doesn't report it. The probe is short-TTL cached and restricted to local/LAN hosts, so remote/cloud endpoints are never contacted.	2026-06-02 23:01:04 +09:00
ghreprimand	06a3468967	Surface deep research probe errors (#1086 ) Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 22:51:25 +09:00
Tatlatat	dc8a882f1f	fix(rag): use a stable hash for document IDs so dedup survives restarts (#1098 ) add_document() and add_documents_batch() derive the persistent ChromaDB document id from Python's built-in hash(): doc_id = f"doc_{hash(text) % 10**16}" str hashing is randomized per process (PYTHONHASHSEED is on by default), so the same document text gets a different doc_id on every restart. The dedup check right after — self._collection.get(ids=[doc_id]) — therefore misses on restart, and identical documents are re-embedded and re-added as duplicates each time the app restarts, bloating the vector store and skewing retrieval. Derive the id from a stable hashlib.sha256 of the text via a shared _generate_doc_id() helper, used by both add paths so they agree. tests/test_rag_vector_id_stability.py runs _generate_doc_id in subprocesses under PYTHONHASHSEED=0/1/random and asserts the id is identical across all of them (and differs for different text). Fails before this change.	2026-06-02 22:42:23 +09:00
pewdiepie-archdaemon	ff93a6c63b	Polish email and cookbook flows	2026-06-02 22:42:07 +09:00
Afonso Coutinho	2e2da2aefe	fix: extract_statistics drops large numbers and trailing % signs (#1153 ) * fix: extract_statistics misses comma-less numbers and drops trailing % * fix: same extract_statistics number/percent bug in services copy * test: extract_statistics captures full numbers and percent signs	2026-06-02 22:35:30 +09:00
Afonso Coutinho	2b2943a7b7	fix: extract_quotes accepts mismatched opening/closing quotes (#1113 ) * fix: only extract quotes whose closing quote matches the opening one * fix: same mismatched-quote bug in the services search copy * test: extract_quotes requires matching open/close quotes	2026-06-02 22:34:52 +09:00
ghreprimand	c075abce5d	Search: consolidate core and provider implementations Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 21:02:26 +09:00
Robin Fröhlich	3c6ae3713e	Models: add Z.AI coding endpoint and GLM vision detection	2026-06-02 20:59:17 +09:00
SurprisedDuck	934bca9e48	Providers: omit temperature for OpenAI reasoning models * fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5) These models only accept the default temperature; sending any explicit value (even 0.0) returns HTTP 400 "Only the default (1) value is supported". This broke two paths: - Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so a perfectly valid o3/gpt-5 endpoint is reported as failing in the Model Endpoints health check. - Chat/stream payloads send temperature unconditionally, so a non-default temperature preset 400s on these models. The code already special-cases the same model family for max_completion_tokens, so this adds a sibling _restricts_temperature() helper and omits the field for those models, letting the API use its required default. gpt-4.5 is intentionally excluded (not a reasoning model; accepts temperature normally). Adds tests/test_llm_core_temperature.py covering the predicate and the synchronous payload builder. * fix: also omit temperature for reasoning models on the direct-POST paths The first commit only covered llm_call/llm_call_async/stream_llm and the endpoint probe. Email auto-summary, urgency-less spam classification, the email reply-summary endpoint, and gallery vision tagging build their OpenAI payloads inline and POST them directly (requests/httpx), bypassing llm_core — so a reasoning model configured there would still 400 on the temperature field. These sites already branch on _uses_max_completion_tokens, so they're the same class; added the matching _restricts_temperature guard. gallery_routes also gains the max_completion_tokens branch it was missing, so gpt-5 vision tagging works end to end. Note: email_pollers urgency scoring goes through llm_call_async and was already covered.	2026-06-02 20:58:33 +09:00
Nikita Rozanov	119075f368	Research: add configurable run timeout Surfaces the research_run_timeout_seconds setting (added in #783) in Settings → Research as a "Max Time" field, and lets 0 disable the wall-clock cap entirely for long deep-research runs. - settings.py: document that 0 disables the cap; default stays 1800s. - research_handler.py: resolve 0 (or negative) to no timeout (asyncio.wait_for timeout=None); other values stay bounded to [60, 86400] as before. - index.html / settings.js: "Max Time" input bound to research_run_timeout_seconds, validated to {0} ∪ [60, 86400], with copy making explicit that 0 = no limit (unbounded model/API cost). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:57:57 +09:00
Tushar-Projects	c3228f8b59	Background tasks: respect active session model fallback	2026-06-02 20:57:42 +09:00
LittleLlama	c85da91964	Tasks: ship email boundary task paused by default Co-authored-by: Claude <noreply@anthropic.com>	2026-06-02 20:53:02 +09:00
Leo	6c15dc7d33	Chat metrics: surface backend generation speed * Chat metrics: show backend's true generation t/s, not tokens÷wall-clock The per-message tokens/sec read low and felt wrong because it was computed as output_tokens / total_duration, where total_duration is wall-clock including prefill, tool calls, and network — not pure decode time. llama.cpp already reports the correct gen speed in its stream (timings.predicted_per_second), but it was being dropped. - llm_core.py: when parsing the OpenAI-compatible usage chunk, also read the sibling `timings` block llama.cpp includes — pass predicted_per_second through as gen_tps and prompt_per_second as prefill_tps on the usage event. - agent_loop.py: capture backend_gen_tps/backend_prefill_tps from usage events; in _compute_final_metrics prefer backend_gen_tps over the wall-clock division when present (fall back to computed for cloud APIs that omit timings). Tag the result with tps_source ("backend" vs "computed") and surface prefill_tps. Result: the displayed t/s now matches the model's real decode speed and is stable regardless of prompt length (a long prefill no longer deflates it). Checks: py_compile passes; verified extraction against a real llama.cpp final chunk (gen 79 t/s surfaced vs the deflated wall-clock figure shown before). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Chat metrics: surface true t/s on the direct-chat path too Follow-up to the gen-tps work: the non-agent direct-chat stream path in chat_routes turned the raw `usage` event straight into a metrics event but only copied token counts — it never set tokens_per_second or response_time. So simple (non-tool) replies showed "Speed: n/a" / "Time: undefineds" and the chip fell back to a bare token count ("27 tok") instead of t/s. Map the usage event's gen_tps (llama.cpp timings.predicted_per_second, added in the prior commit) into tokens_per_second here too, tag tps_source=backend, and set response_time from wall-clock for the stats popup. Checks: py_compile passes; verified llama.cpp emits usage+timings on the final stream chunk (gen ~90 t/s) that this path consumes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Tests: backend gen/prefill t/s passthrough and preference Cover the two pieces of the true-t/s metric so it can be reviewed on its own: - stream_llm surfaces llama.cpp's timings.predicted_per_second / prompt_per_second as gen_tps / prefill_tps on the usage event (captured llama.cpp final-chunk fixture), and omits them when the backend reports no timings. - _compute_final_metrics prefers backend_gen_tps over output/wall-clock, tags tps_source ("backend" vs "computed"), and surfaces prefill_tps. Reuses the fake-client stream harness from test_llm_core_streaming.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 20:52:08 +09:00
Ernest Hysa	064c1ace91	Uploads: write uploads index atomically * fix(upload): atomic-rename writes for uploads.json + .bak recovery UploadHandler.save_upload does a read-modify-write of uploads.json via two open(..., 'w') + json.dump blocks, with no lock, no temp+rename, and no recovery. N concurrent inserts lost N-1 entries (last writer wins after the read snapshot is taken); a SIGKILL/SIGTERM mid-json.dump truncated the file and the bare 'except Exception: logger.warning(...)' recovery path returned {}, silently dropping every prior upload. The handler now serialises the RMW under a per-instance threading.Lock and writes through _atomic_write_json, which writes to a tempfile in the same directory, fsyncs, snapshots the previous live to .bak, and renames the temp onto the target via os.replace. os.replace is atomic on POSIX, so a reader sees either the old or the new state, never a half-written file. _load_upload_index tries the live file first, then falls back to the .bak sibling if the live is corrupt. Cross-process safety is still on the deployer: gunicorn workers on the same uploads dir will race the lock, and the atomic-rename is the kernel-level guarantee that prevents torn reads. If multi-worker writes are expected, fcntl.flock around the rename is a follow-up; single-worker and async deployments are correct as-is. * fix(upload): reload uploads.json inside _index_lock on dedupe path The duplicate-detection branch in save_upload() was reading uploads.json before taking _index_lock, then writing that stale snapshot under the lock. A duplicate upload racing with a new-entry insert could clobber the new entry because the duplicate's snapshot predated the insert. The new-entry branch already reloaded inside the lock; the duplicate branch now does the same. It also re-resolves the storage key inside the lock, because a concurrent insert can have changed the dict's keys. If the entry has been cleaned up between the outer read and the inner write, the function falls through to the fresh-insert path instead of silently writing a stale row. Boundary note: the _index_lock serialises writers within a single Python process. Cross-process / multi-worker deployments still need flock or a database; the inline comment is updated to make this explicit. The atomic-rename write keeps the on-disk state consistent but does not serialise writers across processes. Tests: - Existing concurrent-insert and partial-write-recovery tests still pass. - New test_atomic_write_primitives_present_in_production_code asserts the production module has at least two 'with self._index_lock:' blocks (regression net for this fix). - New smoke tests: normal upload, duplicate detection, info lookup after a backup-recovery scenario.	2026-06-02 20:51:39 +09:00
mechramc	8e87d3002b	Tasks: clean up queued cancellation state	2026-06-02 20:51:21 +09:00
SurprisedDuck	f975279b26	Notes: parse natural-language due dates on update The 'add' action runs due_date through parse_due_for_user (natural language like 'tomorrow at 9am', plus user-tz anchoring for naive ISO), but 'update' stored the raw value verbatim. A reminder edited with natural language was saved as an unparseable literal the frontend's new Date() can't read, so it never fired. Route update's due_date through the same parser as add.	2026-06-02 20:51:16 +09:00
Tatlatat	7f97ab3032	Topics: hydrate session history before analysis analyze_topics() iterates session_manager.sessions and reads session_data.get("history", []) directly. But SessionManager.load_sessions seeds sessions metadata-only with empty history — messages are loaded lazily, only when get_session(session_id) is called. So analyze_topics saw empty history for every session that hadn't been individually opened this process lifetime and reported total_topics: 0, even when the database held plenty of matching messages. Hydrate each candidate session via session_manager.get_session(session_id) (the existing lazy-load path) before reading its history, after the owner/archived filters so skipped sessions aren't loaded. Falls back to the raw cached history when the manager has no get_session (test stubs). tests/test_topic_analyzer.py: new test_topic_analyzer_hydrates_sessions seeds a real SQLite DB with a session + message, runs the real SessionManager (asserting cached history starts empty), then asserts analyze_topics finds the topic. Fails before this change. The existing keyword tests now pass an explicit owner to satisfy the owner-required early return.	2026-06-02 20:44:27 +09:00
SurprisedDuck	d73c0a13f4	YouTube: enforce comment fetch timeout while waiting asyncio.wait_for wrapped create_subprocess_exec, which returns as soon as the child is spawned, so the timeout never bounded the actual work. yt-dlp could hang indefinitely on proc.communicate() and the except asyncio.TimeoutError branch was unreachable. Bind the wait to communicate() and kill/reap the child if it overruns.	2026-06-02 20:44:24 +09:00
Tatlatat	e084dc993e	Chat: merge consecutive user messages for strict providers After a non-native tool round, the agent appends tool results as a {role: 'user'} message next to the user's original 'user' prompt, producing two consecutive 'user' messages. Strict provider APIs (Anthropic/Claude) reject consecutive same-role messages, so the follow-up generation request fails silently — search returns sources, then nothing is generated. _sanitize_llm_messages now merges consecutive 'user' messages (joining their content). Only user/user is merged; normal chat and agent/tool turns already alternate and are untouched. Scoped down per maintainer review: the agent_loop 'output' source-extraction change is already on main (#898/#901) and the broad-mocking web-sources test was dropped. Added a focused test that runs consecutive-user messages through the real _build_anthropic_payload and asserts the payload alternates correctly.	2026-06-02 20:44:13 +09:00
Tatlatat	dac64f20d9	Text: strip dangling think blocks after visible text `strip_think` removes a dangling (unclosed) `<think>` block via `_THINK_OPEN_RE`, but that pattern was anchored to the start of the string (`^\s<think>`). An unclosed `<think>` (or `<thinking>`) opener that appears after* any leading output was therefore only half-handled: the stray tag itself was removed by `_THINK_TAG_RE`, but the reasoning content following it leaked straight to the user. strip_think("Hello! <think> I am thinking.") # -> "Hello! I am thinking." (leak) strip_think("Sure.\n<think>\nLet me reconsider...") # -> leaks the reasoning `strip_think` feeds user-facing output across research, email replies, notes, and scheduled tasks, so this leaks chain-of-thought to end users. Un-anchor `_THINK_OPEN_RE` so a dangling opener anywhere strips from the opener to end of string, consistent with the existing start-of-string behavior. Content before the opener, closed `<think>...</think>` blocks, and tag-free text are all preserved. tests/test_strip_think.py covers the mid-text leak (fails before this change), start-anchored unclosed, closed blocks, no-tag passthrough, content-before-opener, and mixed closed+unclosed. Full existing think suite still passes.	2026-06-02 20:36:37 +09:00
SurprisedDuck	62f06ab740	Docs: respect path boundary when clearing exclusions add_directory cleared exclusions with a raw path.startswith(directory) test, which also matched sibling directories sharing a name prefix — adding /docs would silently un-exclude files under /docs2. Match the directory itself or paths under it (directory + os.sep) instead.	2026-06-02 20:35:44 +09:00
SurprisedDuck	78747b56ca	Documents: strip PDF marker without corrupting text _process_pdf prepends "\n\n[PDF content]:" to extracted text, and two call sites in document_routes.py stripped it with .lstrip("\n[PDF content]:"). str.lstrip(chars) treats its argument as a set of characters, so it keeps eating into the page text that follows the marker — e.g. a body starting with "to the board" loses its leading "to" because 't'/'o' are in the marker's character set. Replace both sites with a shared strip_pdf_content_marker() helper that uses str.removeprefix.	2026-06-02 20:35:27 +09:00
SurprisedDuck	4307cac966	Research: report empty search provider results clearly Deep Research surfaced 'Error: unknown error' whenever every search provider returned an empty result set without raising (e.g. SearXNG is reachable but all its engines fail internally). _last_search_error was only set on exceptions, so the empty-but-no-exception path left it unset and the caller fell back to 'unknown error'. Record an actionable reason on that path naming the providers that were tried, so users can tell it's a search-backend problem rather than a model problem. The provider-raised path is unchanged. Re: #344. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 20:34:25 +09:00
SurprisedDuck	d06b6d87d3	Models: prefer longest known context match KNOWN_CONTEXT_WINDOWS lists 'o1' (200k) before 'o1-mini' (128k), and _lookup_known returned on the first substring hit — so "o1-mini" matched 'o1' and reported 200000 instead of 128000. Track the longest matching key instead, so the most specific entry wins regardless of table order.	2026-06-02 20:33:09 +09:00
mist	0b0be3c339	Email: recognize forwarded message dividers `_ORIG_RE` (and its JS mirror `_TALON_ORIG_RE`) already recognised the Japanese forward marker `転送` alongside the "Original Message" delimiters, but not the English "Forwarded message" one. So Gmail-style forwards — including the ones Odysseus itself emits (`---------- Forwarded message ----------`, static/js/emailInbox.js) — were not treated as a quote boundary: - with a following Outlook From:/Date: header block, the divider line leaked into the level-0 reply bubble as noise; - with only the divider marking the forward (no header block), the body was not split into turns at all. Add `Forwarded\s+message` to the same `[-_=]{3,}`-delimited alternation in both the server-side parser and the JS mirror, so forward dividers are consumed as an attribution boundary like "----- Original Message -----". Locale variants of "Forwarded message" can follow the existing pattern. Tests cover both manifestations plus a negative control (the bare words "forwarded message" without `[-_=]{3,}` delimiters must not split). Checks: python -m pytest tests/test_forwarded_message_divider.py (3 passed), python -m py_compile src/email_thread_parser.py, node --check static/js/emailLibrary/utils.js, git diff --check.	2026-06-02 20:32:56 +09:00
mist	e249fa4557	Tools: match keyword hints on word boundaries `get_tools_for_query` force-includes whole tool families when the query mentions an intent keyword, but matched with a raw substring test (`kw in ql`). Short hints therefore fired inside unrelated words, bloating the tool set with irrelevant tools: - "fix" matched "prefix" -> document tools - "line" matched "deadline"/"online" -> document tools - "serve" matched "observe"/"reserve" -> cookbook serve tools - "reply" matched "replying" -> all email tools - "unread" matched "unreadable" -> all email tools Match each keyword on word boundaries instead (`re.search(rf"\b{re.escape(kw)}\b", ql)`), the same fix already applied to the keyword matcher in topic_analyzer.py. Genuine intent keywords ("reply to this email", "edit the document", "serve the model") still match. This only removes substring-inside-a-word matches; it does not change whole -word matches (so e.g. an unrelated whole word like "tell" is a separate keyword-choice question, left untouched here). Checks: python -m pytest tests/test_tool_index_keyword_boundaries.py (4 passed; 3 of them fail on the pre-fix substring code), python -m py_compile src/tool_index.py, git diff --check.	2026-06-02 20:32:20 +09:00
mist	8f0518c0ae	Presets: fill missing built-in defaults on load PresetManager.load already heals a forward-incompatible presets.json: the block just above repairs the legacy `custom` shape and re-saves the file. But if the file exists and is missing a whole built-in preset (e.g. an older install written before `reason` existed), load returned it as-is, so that built-in stayed permanently absent — silently missing from the picker that GET /api/presets feeds, with no way for the user to get it back. Extend the same self-heal: after the legacy migration, fill in any built-in presets the loaded file is missing, defaults-first so user edits win, and persist the result. This never clobbers an intentional removal — there is no delete path for the built-in keys (only user_templates entries can be deleted), and presets are hidden via an `enabled: False` flag, not removal. Checks: python -m pytest tests/test_preset_fill_missing_defaults.py (3 passed; 2 fail on the pre-fix code), the existing preset cases in tests/test_review_regressions.py still pass, python -m py_compile src/preset_manager.py, git diff --check.	2026-06-02 20:32:08 +09:00
Refuse	4218bfe71e	Tools: restrict app_api and serve_preset to admins Co-authored-by: RefuseOdd <refuseodd@users.noreply.github.com>	2026-06-02 20:29:47 +09:00
Tatlatat	9389cabed0	API keys: skip undecryptable entries on load APIKeyManager.load() decrypts every stored key with a dict comprehension and no error handling. If the .key file no longer matches the ciphertext in api_keys.json — key rotated, a partial/!mismatched data restore, or a corrupted .key — Fernet.decrypt raises cryptography.fernet.InvalidToken. app_initializer.py calls api_key_manager.load() during startup, so a single undecryptable entry takes down the whole app at boot, and the user can't reach the UI to fix it. Decrypt each key in a loop and, on InvalidToken/ValueError, log a warning and skip that one entry while still returning every key that decrypts cleanly. One bad/stale key no longer blocks startup. tests/test_api_key_manager_resilience.py saves a valid key, then injects an entry encrypted under a different Fernet key (InvalidToken) and a malformed token (ValueError), and asserts load() returns the good key and skips the bad ones without raising. Fails before this change.	2026-06-02 20:28:26 +09:00
Tatlatat	da3876c168	Webhook: block IPv6 SSRF bypasses The webhook URL guard's _ip_is_private() only checks a hardcoded _PRIVATE_NETWORKS list, which misses several addresses that route internally. validate_webhook_url() therefore ALLOWED: - http://[::]/ (IPv6 unspecified, reaches localhost) - http://[::ffff:127.0.0.1]/ (IPv4-mapped IPv6 loopback = 127.0.0.1) - http://[::ffff:169.254.169.254]/ (IPv4-mapped cloud metadata endpoint) The last one is the dangerous case: a webhook pointed at the mapped 169.254.169.254 can pull cloud instance credentials (SSRF -> credential theft). Harden _ip_is_private(): first unwrap IPv4-mapped IPv6 to its embedded IPv4 (addr.ipv4_mapped), then reject via the stdlib address properties (is_private, is_loopback, is_link_local, is_reserved, is_multicast, is_unspecified) in addition to the existing network list. Public addresses still pass. tests/test_webhook_ssrf_resilience.py asserts validate_webhook_url raises for the three IPv6 bypasses plus 127.0.0.1 and 0.0.0.0, and still accepts a public IP literal. The IPv6 cases fail before this change.	2026-06-02 20:28:12 +09:00
Ernest Hysa	a8a34bd22a	Ollama: pass discovered num_ctx in chat requests _build_ollama_payload sends options.temperature and options.num_predict to /api/chat, but never options.num_ctx. Ollama defaults num_ctx to 2048 when the option is omitted, so prompts going to any Ollama backend are silently truncated there regardless of the model's actual capability. Thread the discovered context length through the three call sites (llm_call, llm_call_async, stream_llm) and emit options.num_ctx when it is known and positive. The builder filters out the DEFAULT_CONTEXT fallback (128000) so we don't lie to Ollama about models whose window we couldn't actually discover. The issue's literal 'when > 2048' heuristic is dropped: a model with a real context smaller than 2048 would OOM if Ollama used its default, so we pass the real value regardless of size. Matches how src/context_compactor.py uses the same helper. Sister fix to PR #753 — that PR teaches the compactor the right budget, this one tells Ollama to actually use that budget on the way in.	2026-06-02 20:27:24 +09:00
MohammadYusif	65b5d65059	fix(agent): extract web search sources from output key tool_execution.py returns web search results as {"output": ..., "exit_code": 0}. The sources-extraction block in stream_agent_loop only checked result.get("results") and result.get("stdout"), so _src_text was always "" for every tool-call-mode web search. Two consequences: 1. The SOURCES marker was never parsed and the web_sources SSE event was never emitted -- the sources panel never appeared after agent-mode searches. 2. The marker (a large JSON blob) was left in result["output"] and forwarded verbatim to the LLM in round 2 via format_tool_result, confusing some local models into producing no tokens. Fix: prepend result.get("output") to the lookup chain, and update the cleanup assignment so result["output"] is overwritten with the stripped text. Adds six regression tests in tests/test_agent_loop.py documenting the before/after behaviour and verifying backward compat with the legacy results/stdout paths. Co-authored-by: MohammadYusif <MohammadYusif@users.noreply.github.com>	2026-06-02 13:06:09 +09:00
Tatlatat	acfdcf346c	fix(agent): map native google_search and surface empty rounds Models (notably Gemini) emit a native 'google_search' function call, but the agent loop had no mapping for it, so the call failed to convert, the round produced 0 chars and 0 tool blocks, and generation died silently — the web client hung on 'waiting for first token' with no error (also #443). - Map google_search / google_search_retrieval / google_search_grounding to the web_search tool, and read Gemini's 'queries' array (falling back to 'query'). - In stream_agent_loop, when a round yields no response text and no tool events, emit a visible fallback message instead of leaving the user hanging. - Give the unknown-tool execution branch an explicit exit_code=1 so the failure is logged as an error rather than 'n/a'. Unknown/unconvertible tool names still return None (unchanged) so they are dropped safely rather than executed. Added tests covering the google_search mapping, the queries array, and unknown/invalid-JSON returning None.	2026-06-02 12:57:45 +09:00
ghreprimand	77611f0491	Scope memory consolidation by owner group Co-authored-by: ghreprimand <203024559+ghreprimand@users.noreply.github.com>	2026-06-02 12:40:28 +09:00
Rolly Calma	32efeeb3a2	chore: use running event loop in async helpers (#821 )	2026-06-02 12:28:05 +09:00
mist	fca8d68aba	Match host, not substring, when resolving DuckDuckGo redirects (#886 ) _resolve_ddg_redirect (the DuckDuckGo /l/?uddg= redirect resolver used on every HTML-fallback result href) gated on `"duckduckgo.com" in parsed.hostname`. That substring test also matches look-alike hosts like `duckduckgo.com.evil.com` and `notduckduckgo.com`, so a result link on such a host would be silently rewritten to its embedded `uddg` target. Same substring-vs-hostname pitfall fixed for provider detection in `54ecfa3`. Match the host properly: exactly `duckduckgo.com` or a `.duckduckgo.com` subdomain. Genuine redirects (`//duckduckgo.com/l/...`, and relative `/l/...` hrefs resolved against `html.duckduckgo.com`) keep working. The resolver was a closure inside duckduckgo_search; lifted it (plus the new _is_duckduckgo_host helper) to module scope so it can be unit-tested directly. Adds tests/test_ddg_redirect_resolution.py (red on the look-alike case before this change, green after).	2026-06-02 12:25:56 +09:00
pewdiepie-archdaemon	966b53df77	Improve Cookbook serve diagnostics and recommendations	2026-06-02 12:15:47 +09:00
Prakhya	bdc99d746a	fix: add Browser MCP connection diagnostics (#662 )	2026-06-02 11:50:17 +09:00
NovaUnboundAi	3319310942	Allow longer deep research extraction timeouts (#651 ) Co-authored-by: NovaUnboundAi <NovaUnboundAi@users.noreply.github.com>	2026-06-02 11:50:03 +09:00
nsgds	5645cce6d0	Support vLLM 0.20.2 / NIM reasoning-parser output end-to-end (surface + agent context + render) (#602 ) * fix(stream): read 'reasoning' SSE field for vLLM 0.20.2 / NIM vLLM 0.20.2 / NVIDIA NIM emit reasoning-parser output in the `reasoning` delta field; older builds use `reasoning_content`. stream_llm() read only the latter, so reasoning from models like Nemotron-3-Nano (--reasoning-parser) was silently dropped and never rendered. Accept either field. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agent): keep reasoning_content only on the latest assistant turn The agent loop echoed each round's reasoning back as `reasoning_content` on every assistant turn, assuming vendors ignore it. Nemotron's chat template re-injects ALL prior reasoning_content as <think> blocks, and the loop is trimmed only once (before it starts) — so reasoning accumulated unbounded across rounds, bloating context and feeding the model its own prior reasoning, which reinforced repetition/looping. Strip reasoning_content from earlier assistant turns so only the most recent round carries it (still satisfies DeepSeek's thinking-mode follow-up requirement). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agent-ui): wrap each round's reasoning in its own <think> block The streamed think-tag wrapper gated on whole-message substring checks (accumulated.includes('<think>')), which only ever wrapped ONE reasoning block per message. A multi-round agent response has a reasoning phase per round, so once round 1 closed its <think>...</think>, rounds 2+ reasoning was emitted unwrapped and leaked into the visible answer. Replace the substring checks with a stateful open/close flag that toggles per think/answer cycle, so each round's reasoning gets its own collapsible block. Single-turn chat is unchanged (one open, one close). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(stream): reasoning/reasoning_content delta surfaces as thinking chunk Covers @pewdiepie-archdaemon's requested regression: a streamed {reasoning: ...} delta emits a thinking chunk while {content: ...} streams as normal content; plus the older reasoning_content field for backward compat. Mirrors the #591 scenario. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 11:48:17 +09:00
mist	f13d897093	Fix AttributeError on bullet lines in extract_memory_from_chat (#873 ) The fallback memory extractor (used by routes/memory_routes.py when the LLM extractor fails) matched list items with `r'^[-•]\|\d+\.\s(.)'`. Operator precedence makes that `(^[-•]) \| (\d+\.\s(.))`, so the capture group only exists on the numbered-list branch. A bullet line ("- foo") matches the first branch, so `group(1)` is None and `text_match.group(1).strip()` raises AttributeError — crashing extraction for any assistant message that contains a bullet list (i.e. most of them). Numbered lists happened to work. Group both markers — `r'^(?:[-•]\|\d+\.)\s(.*)'` — so the capture applies to bullets and numbers alike. Adds tests/test_memory_bullet_extraction.py (red before, green after).	2026-06-02 11:46:06 +09:00
Kenny Van de Maele	2b39412355	Expand ~ in read_file and write_file paths (#781 ) read_file/write_file passed the raw path to open(), so a tilde path like ~/notes.txt failed ("not found") — the shell's ~ expansion never happened because there's no shell. Agents then fell back to bash to reach home-dir files. Expand ~ (and ~user) with os.path.expanduser before opening. Checks: python -m py_compile src/tool_execution.py.	2026-06-02 11:45:21 +09:00
Ernest Hysa	7669696bb0	fix(scheduler): push next_run forward on startup to stop restart double-fire (#708 ) TaskScheduler.start() aborts stale TaskRun rows but never advanced ScheduledTask.next_run. Across a restart the in-process _executing set is empty, so the first post-restart _check_due_tasks() call dispatches every task whose next_run is still in the past — and so does every subsequent poll, until the task's regular _execute_task path finally runs compute_next_run and pushes it forward. start() now queries active tasks with next_run < now and pushes each one to now + 60s. The first poll after restart sees them as not-yet-due, the task runs once normally, and compute_next_run puts the schedule back on its real cadence. Paused and not-yet-due tasks are left alone. The validator test was rewritten as a regression test asserting the opposite of the bug it originally demonstrated, plus two narrower cases to lock down the filter (only active+overdue is touched).	2026-06-02 11:43:30 +09:00
Afonso Coutinho	48d3b7abab	fix: topic analysis false-matches keywords as substrings (e.g. 'ai' in 'email') (#687 ) * fix: match topic keywords on word boundaries, not substrings * fix: apply word-boundary matching to topic example snippets too * test: topic keywords match whole words, not substrings	2026-06-02 11:42:04 +09:00

1 2 3

114 Commits