Completes the reviewer requirement from PR #1190 review that was carried
over but not implemented in #1230:
> "The hard max is a function-local constant. For this setting, the ceiling
> should be configurable or at least represented as a named setting/default
> with tests."
— review on #1190#1230 shipped the adaptive auto-derivation but left `DEFAULT_HARD_MAX = 200_000`
as a hardcoded module constant in src/context_budget.py. Admins on premium
APIs with large context windows (kimi-k2 / minimax-m3 at 1M, etc.) can use
their full window today only by setting `agent_input_token_budget`
explicitly — which then takes them off the adaptive auto-path entirely.
## What this PR changes
- src/settings.py: register `agent_input_token_hard_max` in
DEFAULT_SETTINGS, default 200_000 (matches `DEFAULT_HARD_MAX`). Inline
comment documents the no-op semantics in the explicit branch.
- src/agent_loop.py: read the setting at the call site and pass it as the
`hard_max` kwarg of `compute_input_token_budget`. Defensive parsing —
missing / non-int / zero values fall back to `DEFAULT_HARD_MAX`, so a
misconfig cannot silently zero the budget.
- src/tool_implementations.py: three friendly aliases for `manage_settings`:
- "hard max" -> agent_input_token_hard_max
- "token budget cap" -> agent_input_token_hard_max
- "input budget cap" -> agent_input_token_hard_max
Plus the existing "token budget" -> agent_input_token_budget keeps a
matching shorter alias "input budget".
- tests/test_context_budget.py: 6 new tests on top of the existing 6:
- hard_max raises the auto ceiling (1M ctx + raised cap -> 85% of ctx)
- hard_max lowers the auto ceiling (128K ctx + 50K cap -> 50K)
- hard_max has no effect on the explicit branch
- DEFAULT_SETTINGS contains the new key
- manage_settings aliases are registered
- the live get_setting path returns the override value, and malformed
values fall back per the agent_loop defensive parsing
12 passed in 0.04s. No changes to the pure helper signature or semantics;
#1230's behavior is the default when the new setting is unset.
## How it lets users drop the explicit override
Before this PR, on a 1M-context model:
agent_input_token_budget = 900_000 (explicit) -> 900K [user override]
agent_input_token_budget = <unset> (auto) -> 200K [HARD_MAX]
After this PR, same model:
agent_input_token_budget = <unset>
agent_input_token_hard_max = 900_000
-> min(1M * 0.85, 900K) = 850K [auto, no override needed]
The explicit-override path keeps working unchanged for users who prefer it.
The agent soft-trims input context to `agent_input_token_budget` (default 6000).
The old computation `min(context_length or budget, budget)` made the 6000 default
a hard ceiling for every model, so 128K/1M context models were silently capped at
6000 input tokens — now that num_ctx is sent correctly (#1056), this was the last
barrier to actually using a long context window.
This derives the default budget from the model's discovered context window
(~85%, capped at a generous hard max) while honouring an explicit user setting
exactly (clamped to the window). When the window is unknown it falls back to the
previous value, so behaviour is unchanged for that case.
- src/context_budget.py: pure `compute_input_token_budget()` (unit-testable)
- src/settings.py: `is_setting_overridden()` to tell an explicit user value from
the merged default (load_settings merges DEFAULT_SETTINGS, so equality alone
can't distinguish them)
- src/agent_loop.py: use the helper in the soft-trim path
Covered by tests/test_context_budget.py (6 cases).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>