fix(agent): make context-budget hard_max configurable via agent_input_token_hard_max setting (#1273)
Completes the reviewer requirement from PR #1190 review that was carried over but not implemented in #1230: > "The hard max is a function-local constant. For this setting, the ceiling > should be configurable or at least represented as a named setting/default > with tests." — review on #1190 #1230 shipped the adaptive auto-derivation but left `DEFAULT_HARD_MAX = 200_000` as a hardcoded module constant in src/context_budget.py. Admins on premium APIs with large context windows (kimi-k2 / minimax-m3 at 1M, etc.) can use their full window today only by setting `agent_input_token_budget` explicitly — which then takes them off the adaptive auto-path entirely. ## What this PR changes - src/settings.py: register `agent_input_token_hard_max` in DEFAULT_SETTINGS, default 200_000 (matches `DEFAULT_HARD_MAX`). Inline comment documents the no-op semantics in the explicit branch. - src/agent_loop.py: read the setting at the call site and pass it as the `hard_max` kwarg of `compute_input_token_budget`. Defensive parsing — missing / non-int / zero values fall back to `DEFAULT_HARD_MAX`, so a misconfig cannot silently zero the budget. - src/tool_implementations.py: three friendly aliases for `manage_settings`: - "hard max" -> agent_input_token_hard_max - "token budget cap" -> agent_input_token_hard_max - "input budget cap" -> agent_input_token_hard_max Plus the existing "token budget" -> agent_input_token_budget keeps a matching shorter alias "input budget". - tests/test_context_budget.py: 6 new tests on top of the existing 6: - hard_max raises the auto ceiling (1M ctx + raised cap -> 85% of ctx) - hard_max lowers the auto ceiling (128K ctx + 50K cap -> 50K) - hard_max has no effect on the explicit branch - DEFAULT_SETTINGS contains the new key - manage_settings aliases are registered - the live get_setting path returns the override value, and malformed values fall back per the agent_loop defensive parsing 12 passed in 0.04s. No changes to the pure helper signature or semantics; #1230's behavior is the default when the new setting is unset. ## How it lets users drop the explicit override Before this PR, on a 1M-context model: agent_input_token_budget = 900_000 (explicit) -> 900K [user override] agent_input_token_budget = <unset> (auto) -> 200K [HARD_MAX] After this PR, same model: agent_input_token_budget = <unset> agent_input_token_hard_max = 900_000 -> min(1M * 0.85, 900K) = 850K [auto, no override needed] The explicit-override path keeps working unchanged for users who prefer it.
This commit is contained in:
@@ -1487,13 +1487,23 @@ async def stream_agent_loop(
|
||||
_t3 = time.time()
|
||||
try:
|
||||
from src.context_compactor import trim_for_context
|
||||
from src.context_budget import compute_input_token_budget
|
||||
from src.context_budget import compute_input_token_budget, DEFAULT_HARD_MAX
|
||||
from src.settings import is_setting_overridden
|
||||
|
||||
soft_budget = int(get_setting("agent_input_token_budget", 6000) or 0)
|
||||
if soft_budget > 0:
|
||||
before_trim_tokens = estimate_tokens(messages)
|
||||
reserve_tokens = min(max(max_tokens or 1024, 512), 2048)
|
||||
# Honour the configurable ceiling for the auto-derived budget path.
|
||||
# No-op when the user has an explicit `agent_input_token_budget`
|
||||
# (that branch ignores hard_max). Falls back to DEFAULT_HARD_MAX
|
||||
# on missing/malformed values so misconfig can't zero the budget.
|
||||
try:
|
||||
hard_max = int(get_setting("agent_input_token_hard_max", DEFAULT_HARD_MAX) or DEFAULT_HARD_MAX)
|
||||
except (TypeError, ValueError):
|
||||
hard_max = DEFAULT_HARD_MAX
|
||||
if hard_max <= 0:
|
||||
hard_max = DEFAULT_HARD_MAX
|
||||
# Scale the default budget to the model's context window so long-context
|
||||
# models aren't silently capped at 6000; an explicit user setting is
|
||||
# still honoured (clamped to the window). (#1170)
|
||||
@@ -1501,6 +1511,7 @@ async def stream_agent_loop(
|
||||
soft_budget,
|
||||
context_length,
|
||||
is_setting_overridden("agent_input_token_budget"),
|
||||
hard_max=hard_max,
|
||||
)
|
||||
trimmed_messages = trim_for_context(
|
||||
messages,
|
||||
|
||||
Reference in New Issue
Block a user