feat: adapt agent_input_token_budget to the model context window (#1170) (#1230)

The agent soft-trims input context to `agent_input_token_budget` (default 6000). The old computation `min(context_length or budget, budget)` made the 6000 default a hard ceiling for every model, so 128K/1M context models were silently capped at 6000 input tokens — now that num_ctx is sent correctly (#1056), this was the last barrier to actually using a long context window. This derives the default budget from the model's discovered context window (~85%, capped at a generous hard max) while honouring an explicit user setting exactly (clamped to the window). When the window is unknown it falls back to the previous value, so behaviour is unchanged for that case. - src/context_budget.py: pure `compute_input_token_budget()` (unit-testable) - src/settings.py: `is_setting_overridden()` to tell an explicit user value from the merged default (load_settings merges DEFAULT_SETTINGS, so equality alone can't distinguish them) - src/agent_loop.py: use the helper in the soft-trim path Covered by tests/test_context_budget.py (6 cases). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 23:13:53 +08:00
parent 1fda906407
commit 8c376d2b0e
4 changed files with 126 additions and 1 deletions
--- a/src/agent_loop.py
+++ b/src/agent_loop.py
@@ -1487,12 +1487,21 @@ async def stream_agent_loop(
    _t3 = time.time()
    try:
        from src.context_compactor import trim_for_context
+        from src.context_budget import compute_input_token_budget
+        from src.settings import is_setting_overridden

        soft_budget = int(get_setting("agent_input_token_budget", 6000) or 0)
        if soft_budget > 0:
            before_trim_tokens = estimate_tokens(messages)
            reserve_tokens = min(max(max_tokens or 1024, 512), 2048)
-            effective_budget = min(context_length or soft_budget, soft_budget)
+            # Scale the default budget to the model's context window so long-context
+            # models aren't silently capped at 6000; an explicit user setting is
+            # still honoured (clamped to the window). (#1170)
+            effective_budget = compute_input_token_budget(
+                soft_budget,
+                context_length,
+                is_setting_overridden("agent_input_token_budget"),
+            )
            trimmed_messages = trim_for_context(
                messages,
                effective_budget,