odysseus

Author	SHA1	Message	Date
Wes Huber	49885ff9e7	fix(documents): use strip_pdf_content_marker instead of lstrip for PDF auto-open (#1727 ) lstrip("\n[PDF content]:") treats the argument as a character set, not a prefix, so it chews into the following [Page N text]: marker — e.g. turning [Page 1 text]: into "age 1 text]:". The correct helper strip_pdf_content_marker (which uses removeprefix) already exists in the same file and is used by other call sites. Fixes #1663 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 13:30:04 +09:00
Ethan	0e538ecd29	Fix RAG remove_directory wiping the entire shared collection (#1660 ) (#1734 ) Removing one RAG directory destroyed the whole shared ChromaDB collection (all owners + base index) instead of just that directory's chunks. Shared root cause: PersonalDocsManager.remove_directory called rebuild_index() (delete_collection + recreate) then re-indexed only the remaining tracked dirs (ownerless, never personal_dir). The targeted VectorRAG.remove_directory that should have been used was itself broken (where={"source":{"$contains":dir}} selects nothing on scalar metadata and would over-delete siblings), and the dead do_manage_rag path fired a second unconditional rebuild. - VectorRAG.remove_directory: select chunks in Python by a path-boundary match on the stored absolute `source` (dir or dir+os.sep), abspath-normalized. Keys on `source` (always written), never `owner` -- no migration. - PersonalDocsManager.remove_directory: call the targeted remove instead of rebuild_index() + partial reindex. - do_manage_rag (dead code): drop the second rebuild_index() (hygiene). - rag_server.py add path: abspath so indexed `source` matches the remove. No schema change. Prevents future wipes (does not recover already-wiped vectors). Adds hermetic regression tests at three layers. Fixes #1660 Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>	2026-06-03 13:29:51 +09:00
Ethan	b9c382006e	Clamp Anthropic temperature to [0.0, 1.0] in _build_anthropic_payload (#1737 ) Anthropic's Messages API rejects temperature > 1.0 with HTTP 400, but _build_anthropic_payload forwarded it verbatim. The shipped "Nietzsche" preset uses temperature 1.2 and the UI slider allows up to 2.0, so every Claude request under such a preset hard-broke. Clamp into [0.0, 1.0] in the Anthropic builder only (OpenAI keeps its wider 0.0-2.0 range). Covers all three Anthropic call paths, which build through this one function. None is passed through unchanged. Fixes #1615 Co-authored-by: Ethan <23321960+0xLeathery@users.noreply.github.com>	2026-06-03 13:29:36 +09:00
Afonso Coutinho	96a874c604	fix: a non-dict finding silently drops all raw research findings (#1739 )	2026-06-03 13:29:29 +09:00
Afonso Coutinho	063e7114e3	fix: youtube transcript formatter crashes on a non-dict segment (#1745 )	2026-06-03 13:29:08 +09:00
Afonso Coutinho	133948cc78	fix: uploads with _ or - in the extension become permanently unreadable (#1756 )	2026-06-03 13:28:45 +09:00
Afonso Coutinho	51857c9008	fix: chat memory extraction crashes on a non-dict message (#1749 )	2026-06-03 13:25:48 +09:00
clockworksquirrel	2625e97f11	Stop conversations crashing during compaction on tool-call turns (#1777 ) context_compactor.maybe_compact built its summary text with msg.get('content', '')[:2000], which raised TypeError: 'NoneType' object is not subscriptable on assistant turns whose content is None (turns that carried only native tool_calls). Once a conversation crossed the 85% compaction threshold — reached after only a few turns on small-context local models plus the large agent prompt — every subsequent message failed ("send more than three messages and it stops working"). Flatten message content to text first via a _content_as_text helper (str passthrough, multimodal list blocks joined, None -> "") and tolerate a missing role. Adds tests/test_context_compactor.py covering the helper and a >=4-message conversation that forces compaction with a None-content tool-call turn (fails before this change, passes after). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 13:25:33 +09:00
Afonso Coutinho	2fa4d50115	fix: is_youtube_url crashes on a non-string url (#1752 )	2026-06-03 13:24:33 +09:00
Wes Huber	3abb735200	fix(security): scope send_to_session agent tool by owner (#1757 ) send_to_session was the only agent tool that didn't check session ownership — an agent acting for user A could read from and write into user B's session on a multi-user instance. Add owner parameter and reject access when the target session belongs to a different user, matching the pattern used by create_session, list_sessions, and manage_session. Fixes #1616 Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-06-03 13:24:08 +09:00
Afonso Coutinho	b3da01efd5	fix: ui_control rejects the advertised rag toggle (#1763 )	2026-06-03 13:24:00 +09:00
Lucas Daniel	68da800dcb	fix(agent): stop sending tool schemas to native Ollama endpoints (#1765 ) Models like gemma4, qwen3.5, and ministral served via Ollama's native /api/chat respond to OpenAI-style tool schemas by emitting a single native tool_call chunk and then stopping. The agent loop receives 1 token of round_response and no recognised ToolBlock, so the round ends immediately — the user sees a one-token response. Root cause: _is_api_model was True for any endpoint whose host appears in _API_HOSTS (which includes "host.docker.internal" and "localhost") OR whose model name matches a keyword like "gemma". Native Ollama endpoints were never excluded from this path. Fix: import _is_ollama_native_url from llm_core and treat native Ollama endpoints (/api/chat, port 11434) as text-only by default — falling back to the fenced-block tool path the local models are tuned for. The per-endpoint supports_tools=True toggle (Settings → Endpoints) still overrides this for users who have explicitly opted in. Fixes #1567	2026-06-03 13:23:42 +09:00
Afonso Coutinho	d25a860f71	fix: document tidy crashes on a duplicate with NULL timestamps (#1772 )	2026-06-03 13:23:01 +09:00
Afonso Coutinho	db1596f3b4	fix: signature learning never skips support@/info@/admin@ senders (#1773 )	2026-06-03 13:22:52 +09:00
pewdiepie-archdaemon	ed7956cbd3	Owner-scope RAG doc ids so identical chunks across users don't collide (#1738 , #1760 ) _generate_doc_id hashed only text. add_document / add_documents_batch early-return when the id exists, so the second owner indexing a byte-identical chunk hit the first owner's id, was silently dropped, and never stored under their owner — their owner-filtered search then quietly omitted it. Hash owner + text; empty owner reproduces the legacy id, so the unowned/base index keeps existing ids and isn't re-churned. Same-owner identical chunks still dedupe. Caught by #1738 and #1760 (independent reports of the same bug).	2026-06-03 11:36:31 +09:00
pewdiepie-archdaemon	9960d55a41	Decrypt CalDAV password before write-back (#1731 ) writeback_event read cfg["password"] (the encrypted blob) and passed it straight to DAVClient, so every local create/edit/delete authenticated with the literal ciphertext, the remote rejected it, and the change never reached the server — the exact silent-write-loss this module was built to prevent. The pull path src/caldav_sync.py already decrypts; mirror that. decrypt() is a no-op on legacy plaintext. Caught by #1731.	2026-06-03 11:36:12 +09:00
pewdiepie-archdaemon	6153c5ed68	Close app_api blocklist gap for bare /api/tokens and /api/users The blocklist prefixes had trailing slashes, so path.startswith() only matched /api/tokens/{id} but not /api/tokens itself — the bare GET (list) and POST (mint) endpoints were reachable via app_api. Same gap on /api/users (list/create/delete). Drop trailing slashes so both bare and sub-resource forms are blocked. /api/auth and /api/admin had no bare endpoints today but get the same treatment to prevent future drift. Caught by #1462.	2026-06-03 11:20:39 +09:00
Afonso Coutinho	aa5e3f6884	fix: is_markitdown_format crashes on a non-string path (#1618 )	2026-06-03 09:00:10 +09:00
Afonso Coutinho	fc220f760f	fix: inside_base_dir raises TypeError on a non-string path instead of failing closed (#1619 )	2026-06-03 09:00:04 +09:00
Afonso Coutinho	2d94e38d23	fix: document_actions title/content helpers crash on non-string input (#1621 )	2026-06-03 08:59:55 +09:00
Afonso Coutinho	03ddc5d2c4	fix: check_outbound_url crashes on a truthy non-string URL (#1623 )	2026-06-03 08:59:49 +09:00
Afonso Coutinho	3175d7ca21	fix: tool-block parsing crashes on a non-string input (#1628 )	2026-06-03 08:59:42 +09:00
Afonso Coutinho	d818117d4c	fix: _extract_skill_json crashes on a truthy non-string teacher response (#1630 )	2026-06-03 08:59:36 +09:00
Afonso Coutinho	8783f12c4c	fix: builtin_actions heuristics crash on a truthy non-string input (#1639 )	2026-06-03 08:59:16 +09:00
Afonso Coutinho	82c09dd768	fix: split_chunks emits a duplicate trailing chunk for text over size-overlap (#1573 )	2026-06-03 08:57:54 +09:00
red person	8051e25c65	Reject CalDAV writeback events without uid (#1582 )	2026-06-03 08:57:15 +09:00
red person	f39c87561b	Save only string personal doc paths (#1566 )	2026-06-03 08:37:29 +09:00
Afonso Coutinho	382d49d887	fix: validate_caldav_url crashes with TypeError on a non-string URL (#1608 )	2026-06-03 08:35:16 +09:00
lekt8	1f743970dd	Don't lose deep-research findings when synthesis times out (#1551 ) (#1562 ) Two problems made deep research report "No information could be gathered" even after it had extracted findings, on slow local models (reporter served a 20B via LM Studio): - _synthesize hard-capped its LLM call at timeout=60, while extraction uses the user's extraction_timeout (300s here) and the final report uses 180s. The slow model needed >60s to synthesize the round's findings, so synthesis timed out after 3 attempts. Raised it to 180s to match the final-report call. - When synthesis produced no report (it returns the unchanged, still-empty report on failure during round 1), the run hit `if not report: return "No information could be gathered…"` and discarded the findings it had already gathered. Now it falls back to a compiled report built from those findings (_fallback_report) so the user keeps the gathered material. Tests stub the LLM (no live model/DB), pin the synthesis timeout >= 180, that the fallback surfaces the findings rather than the give-up message, and that a failed synthesis preserves the previous report. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 08:11:44 +09:00
Afonso Coutinho	c9361262df	fix: APIKeyManager.load crashes app startup on a corrupt/wrong-shape api_keys.json (#1565 )	2026-06-03 08:11:37 +09:00
lekt8	583df3dd6a	Recognize gemma3/llama4/mistral-small3.1+/multimodal as vision models (#1430 ) is_vision_model() classified several genuinely multimodal families as text-only because their names contain neither "vision" nor "vl": Gemma 3 (4b+), Llama 4, Mistral Small 3.1/3.2, and *-multimodal models (e.g. phi-4-multimodal). For those the attached image was stripped before the request, so the model never saw it — a "can't read the image" report (issue #1274), common with Ollama tags like gemma3:4b. Add those keywords (plus a generic "multimodal"). Per the file's err-toward-True policy (#124), a rare text-only tag treated as vision is the safer failure than dropping a real image. Guard tests confirm the text-only siblings (gemma2, plain gemma, mistral-small, phi-3) are not over-matched. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:17:40 +09:00
lekt8	0ec8415f0e	Fix multi-file uploads tripping the per-IP concurrency guard (#1346 ) (#1362 ) * Stop multi-file uploads from tripping the per-IP concurrency guard The /api/upload concurrency check summed its condition over `files`, but the condition didn't reference the loop variable — so it collapsed to len(files) whenever the IP had any recent upload. A single multi-file batch sent right after another upload therefore counted itself as N concurrent uploads and hit max_concurrent_uploads (3), returning 429. The browser swallows the 429 (no `files` in the body) and sends the chat with no attachments, so the model "doesn't even see" them (issue #1346). Count genuine recent upload events instead, via a pure count_recent_uploads() helper, independent of the current batch's file count. save_upload still enforces the per-minute sliding-window rate limit per file, so throttling is preserved. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Also reconcile the per-minute upload rate limit with the batch cap Follow-up within #1346: even after the concurrency-guard fix, a 6+ file batch still failed because save_upload() counts each file against upload_rate_limit (was 5/min) while the composer allows MAX_FILES=10 per batch — the reporter saw "5 attachments work, 6 fail". Raise the per-minute file cap to 60 so a single full batch (and a few of them) isn't self-rejected; burst abuse stays bounded by max_concurrent_uploads. Add a real 6-file regression + a config guard that the cap exceeds the frontend MAX_FILES. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 04:04:19 +09:00
red person	fd37ccebae	Ignore invalid personal docs state (#1401 )	2026-06-03 04:02:16 +09:00
red person	35c40bce75	Fall back from invalid settings stores (#1416 )	2026-06-03 03:53:05 +09:00
Paulo Victor Cordeiro	1f2a06facd	fix: MCP reconnect via tool passes only server_id to connect_server (#1385 ) * fix: MCP reconnect via tool passes only server_id to connect_server connect_server requires name, transport, command, args, env, and url but the reconnect path in do_manage_mcp only passed the server_id, causing a TypeError on every reconnect attempt. Mirror the pattern used in mcp_routes.py reconnect_server. * test: verify MCP reconnect passes full server config to connect_server Mocks the MCP manager and DB to assert that do_manage_mcp reconnect passes name, transport, command, args, env, and url — not just the server_id.	2026-06-03 03:46:07 +09:00
lekt8	b6843c7621	Route "read that report" to manage_research instead of the HTML render (#1375 ) After a deep-research job completes, a follow-up like "check it out" / "read that report" had the agent web_fetch the /api/research/report/{id} HTML render (and then drift into unrelated searches) instead of reading the saved report (issue #1363). The report text is already available via the manage_research tool (action read), and action list returns ids most-recent-first, so the agent can resolve "the recent report" itself. Strengthen the manage_research instructions: read a finished report via action list -> action read; do NOT web_fetch/app_api the report URL (it renders HTML, not clean text) and do NOT start a fresh web_search just to read an existing report. Annotate the app_api endpoint list to say the same. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 03:24:09 +09:00
Paulo Victor Cordeiro	c3fd969965	fix: once-schedule comparison uses local time against UTC date (#1349 ) When a timezone is configured, `now` is tz-aware local time. The comparison stripped tzinfo with `.replace(tzinfo=None)`, producing naive local time, but `scheduled_date` is stored as naive UTC. For users east of UTC this causes tasks to appear expired prematurely; for users west they linger past due time. Use `_to_utc_naive(now)` to convert to the same reference frame.	2026-06-03 03:07:00 +09:00
lekt8	ce7f5dbbdd	Inject current date into deep research planning and query prompts (#1347 ) Deep research generated search queries from the LLM's training-cutoff knowledge, so it emitted stale-year queries like "best Python tutorials 2025" when the actual year is later (issue #1341). The chat/agent path already grounds the model with "Today is ..." (src/agent_loop.py); the deep research planning and query-generation prompts had no equivalent. Add a small current_date_context() helper and prepend it at the plan and query-generation prompt sites (and the research_handler plan preview path that reuses RESEARCH_PLAN_PROMPT). System-TZ local, portable strftime. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 03:00:52 +09:00
Vykos	b2291fad49	Harden CalDAV credentials and URLs (#1310 )	2026-06-03 02:50:02 +09:00
Aaran Lawing	56656de5bc	fix: RRULE added to schema (#1322 ) * fix: RRULE added to schema * Update tool_schemas.py	2026-06-03 02:47:14 +09:00
Vykos	4771d80eb2	Harden session endpoint owner scope (#1308 )	2026-06-03 02:40:22 +09:00
lekt8	80de69ebb0	feat: document rrule in the manage_calendar tool schema (#1320 ) (#1324 ) * feat: document rrule in the manage_calendar tool schema (#1320) The create_event handler already persists `rrule` (a single event carrying an iCalendar RRULE), but the manage_calendar tool schema didn't list it, so the agent had no documented way to make a recurring event and took a roundabout path. Add `rrule?` to the create_event field list with examples (FREQ=WEEKLY;BYDAY=MO etc.) and an explicit note to create ONE event with the rule rather than looping. Covered by tests/test_calendar_rrule.py: do_manage_calendar create_event with an rrule stores one event with that recurrence; without it, the event is single. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: restore SessionLocal via monkeypatch in #1320 rrule test (review) Per review: the test patched core.database.SessionLocal at module import and never restored it, which could leak the temp DB into later tests in the same process. Move the patch into an autouse monkeypatch fixture so it is restored after each test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 02:37:45 +09:00
Vykos	5ee30cc144	Scope skills usage by owner (#1312 )	2026-06-03 02:27:43 +09:00
Vykos	e73545f64f	Keep Bitwarden unlock password off argv (#1311 )	2026-06-03 02:13:51 +09:00
Afonso Coutinho	a8395b4e4c	fix: agent_input_token_budget wrongly treated as a secret and unsettable from chat (#1294 ) * fix: don't classify agent_input_token_budget as a secret (token must be a suffix) * test: agent_input_token_budget is settable from chat	2026-06-03 01:53:47 +09:00
lekt8	adde94e430	fix: closed document stays active & leaks into new chats (#1160 ) (#1238 ) * fix: closed document no longer stays active and leaks into new chats (#1160) Closing a document tab calls _detachDocFromSession: a doc with content is PATCHed to session_id="" (unlinked, session_id -> NULL, is_active stays True), an empty one is DELETEd. But the in-memory active-document pointer (tool_implementations._active_document_id) was never cleared on either path. The chat doc-injection last-resort looks up that pointer by id and injects it when `not cand.session_id or cand.session_id == session`. An unlinked doc has session_id NULL, so the stale pointer re-surfaced a closed document in later, unrelated chats — the agent kept reading/suggesting edits to a doc the user had closed. Fix: add clear_active_document(doc_id) and call it when a document is unlinked (PATCH session_id="") or deleted, so the pointer no longer resurrects a closed document. clear_active_document only clears when the id matches (or no id), so a different active doc is left untouched. Covered by tests/test_active_document_clear.py (4 cases). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: add route-level regression for #1160 (detach/delete clears active doc) Per review: prove the actual API path, not just the helper. Drives PATCH /api/document/{id} (session_id="") and DELETE /api/document/{id} through TestClient against a temp SQLite DB under real owner routing, and asserts get_active_document() is cleared (and untouched when a different document is closed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: make #1160 route regression hang-proof and dev-DB-independent The route test could hang in other environments: it set DATABASE_URL at import time, which is ignored if core.database was already imported, so it fell back to the real dev DB and could contend for its locks (maintainer saw it hang, exit 124). Rebind to a DEDICATED temporary SQLite engine (NullPool) and patch the document route module's SessionLocal to it via an autouse fixture — so the test never touches the dev DB and is independent of import order. Runs in ~0.3s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: drive #1160 route regression without TestClient (fixes local hang) The route test used Starlette TestClient (middleware app + threadpool), which hung in the maintainer's environment. Rework it to call the async route handlers directly — extracted from the router — with a minimal fake request against a temp-SQLite-patched SessionLocal. Same real coverage (handler + DB + owner routing), but it completes reliably (~0.3s) with no TestClient/threadpool. Verified the maintainer's exact batch now passes: pytest tests/test_document_close_clears_active_route.py \ tests/test_active_document_clear.py \ tests/test_document_tool_owner_scope.py -> 14 passed Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 01:47:13 +09:00
lekt8	1507d140b8	feat: CalDAV write-back — push local event create/update/delete to the remote (#800 ) (#1282 ) * feat: CalDAV write-back — push local event create/update/delete to the remote (#800) CalDAV sync was pull-only (src/caldav_sync.py), so events created, edited, or deleted in Odysseus on a CalDAV-backed calendar only changed local SQLite and never reached the server — they silently vanished on the next pull and never appeared on the user's phone (iCloud, etc.). This adds the missing write half: - src/caldav_writeback.py builds the VEVENT, re-discovers the remote calendar by the same URL-hash the local id was derived from (the remote URL isn't stored), and PUTs/DELETEs the event by UID via the caldav lib. The pure pieces (build_event_ical, find_remote_calendar, push_event) take inputs by argument so they unit-test against a fake client with no network. - create/update/delete event handlers (routes/calendar_routes.py) call it best-effort for caldav-sourced calendars only: the local DB stays the source of truth, a remote failure is logged, never fatal, and local calendars are untouched. Tests: tests/test_caldav_writeback.py (9, pure logic incl. iCal serialization, hash discovery, create/update/delete orchestration) and tests/test_caldav_writeback_route.py (3, route-level: a caldav calendar pushes, a local one does not, delete pushes a delete). 12 passed. Note: write-back re-discovers the remote calendar per write (the URL isn't persisted locally); a follow-up could cache it. Live-iCloud verification needs a real account — flagging for a maintainer pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: drive #800 route regression without TestClient (fixes local hang) Same fix as the document route test: the CalDAV write-back route regression used Starlette TestClient (middleware app + threadpool) which hung in the maintainer's environment. Rework it to call the async create/delete calendar handlers directly — extracted from the router — with a minimal fake request, temp-SQLite-patched SessionLocal, and writeback_event stubbed to record calls. Same coverage (a caldav calendar pushes, a local one does not, delete pushes a delete), completes in ~0.3s with no TestClient. Verified the maintainer's exact batch: pytest tests/test_caldav_writeback.py tests/test_caldav_writeback_route.py -> 12 passed Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 01:44:02 +09:00
Shreyas S Joshi	7504fedb17	fix: surface reasoning_content when content is empty (thinking models) (#1233 ) Thinking models served via llama.cpp without --reasoning-format none (e.g. Qwen3, DeepSeek-R1) route all tokens into reasoning_content and return content="". Two call paths were silently broken: - llm_call / llm_call_async (non-streaming): hard-keyed data["choices"][0]["message"]["content"] raises KeyError or returns empty string, discarding the entire response. - stream_agent_loop end-of-round fallback: when full_response is empty but round_reasoning has content, the existing code replaced the response with the generic empty-response error message, discarding all reasoning tokens that were correctly accumulated during streaming. Fix: in both non-streaming paths use msg.get("content") or msg.get("reasoning_content") or "". In the streaming fallback, surface round_reasoning as the answer before falling through to the error path.	2026-06-03 01:41:24 +09:00
Afonso Coutinho	257f7ee7b2	fix: manage_tasks create handles an explicit null prompt without crashing (#1290 )	2026-06-03 01:40:21 +09:00
nickorlabs	c39d8db12a	fix(agent): make context-budget hard_max configurable via agent_input_token_hard_max setting (#1273 ) Completes the reviewer requirement from PR #1190 review that was carried over but not implemented in #1230: > "The hard max is a function-local constant. For this setting, the ceiling > should be configurable or at least represented as a named setting/default > with tests." — review on #1190 #1230 shipped the adaptive auto-derivation but left `DEFAULT_HARD_MAX = 200_000` as a hardcoded module constant in src/context_budget.py. Admins on premium APIs with large context windows (kimi-k2 / minimax-m3 at 1M, etc.) can use their full window today only by setting `agent_input_token_budget` explicitly — which then takes them off the adaptive auto-path entirely. ## What this PR changes - src/settings.py: register `agent_input_token_hard_max` in DEFAULT_SETTINGS, default 200_000 (matches `DEFAULT_HARD_MAX`). Inline comment documents the no-op semantics in the explicit branch. - src/agent_loop.py: read the setting at the call site and pass it as the `hard_max` kwarg of `compute_input_token_budget`. Defensive parsing — missing / non-int / zero values fall back to `DEFAULT_HARD_MAX`, so a misconfig cannot silently zero the budget. - src/tool_implementations.py: three friendly aliases for `manage_settings`: - "hard max" -> agent_input_token_hard_max - "token budget cap" -> agent_input_token_hard_max - "input budget cap" -> agent_input_token_hard_max Plus the existing "token budget" -> agent_input_token_budget keeps a matching shorter alias "input budget". - tests/test_context_budget.py: 6 new tests on top of the existing 6: - hard_max raises the auto ceiling (1M ctx + raised cap -> 85% of ctx) - hard_max lowers the auto ceiling (128K ctx + 50K cap -> 50K) - hard_max has no effect on the explicit branch - DEFAULT_SETTINGS contains the new key - manage_settings aliases are registered - the live get_setting path returns the override value, and malformed values fall back per the agent_loop defensive parsing 12 passed in 0.04s. No changes to the pure helper signature or semantics; #1230's behavior is the default when the new setting is unset. ## How it lets users drop the explicit override Before this PR, on a 1M-context model: agent_input_token_budget = 900_000 (explicit) -> 900K [user override] agent_input_token_budget = <unset> (auto) -> 200K [HARD_MAX] After this PR, same model: agent_input_token_budget = <unset> agent_input_token_hard_max = 900_000 -> min(1M * 0.85, 900K) = 850K [auto, no override needed] The explicit-override path keeps working unchanged for users who prefer it.	2026-06-03 01:36:57 +09:00

1 2 3 4

175 Commits