feat: adapt agent_input_token_budget to the model context window (#1170) (#1230)

The agent soft-trims input context to `agent_input_token_budget` (default 6000).
The old computation `min(context_length or budget, budget)` made the 6000 default
a hard ceiling for every model, so 128K/1M context models were silently capped at
6000 input tokens — now that num_ctx is sent correctly (#1056), this was the last
barrier to actually using a long context window.

This derives the default budget from the model's discovered context window
(~85%, capped at a generous hard max) while honouring an explicit user setting
exactly (clamped to the window). When the window is unknown it falls back to the
previous value, so behaviour is unchanged for that case.

- src/context_budget.py: pure `compute_input_token_budget()` (unit-testable)
- src/settings.py: `is_setting_overridden()` to tell an explicit user value from
  the merged default (load_settings merges DEFAULT_SETTINGS, so equality alone
  can't distinguish them)
- src/agent_loop.py: use the helper in the soft-trim path

Covered by tests/test_context_budget.py (6 cases).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
lekt8
2026-06-02 23:13:53 +08:00
committed by GitHub
parent 1fda906407
commit 8c376d2b0e
4 changed files with 126 additions and 1 deletions

View File

@@ -1487,12 +1487,21 @@ async def stream_agent_loop(
_t3 = time.time()
try:
from src.context_compactor import trim_for_context
from src.context_budget import compute_input_token_budget
from src.settings import is_setting_overridden
soft_budget = int(get_setting("agent_input_token_budget", 6000) or 0)
if soft_budget > 0:
before_trim_tokens = estimate_tokens(messages)
reserve_tokens = min(max(max_tokens or 1024, 512), 2048)
effective_budget = min(context_length or soft_budget, soft_budget)
# Scale the default budget to the model's context window so long-context
# models aren't silently capped at 6000; an explicit user setting is
# still honoured (clamped to the window). (#1170)
effective_budget = compute_input_token_budget(
soft_budget,
context_length,
is_setting_overridden("agent_input_token_budget"),
)
trimmed_messages = trim_for_context(
messages,
effective_budget,