Don't force-include the email toolset on every "tell me" query (#1707) (#1735)

The agent tool-RAG force-includes a keyword hint's tools whenever any of its
keywords appears in the query (word-boundary match). The email-intent hint listed
"tell", which matches a huge fraction of requests — e.g. "visit <url> and tell
me the title" — so the whole email toolset was force-included and crowded out the
relevant tools. The model then saw a prompt dominated by email tools and reported
it had no web search / could not visit the URL.

Remove "tell" from the email keyword set. Genuine email intent still fires on
email/mail/gmail/inbox/unread/message/send/reply.

Test drives get_tools_for_query directly with retrieval stubbed (the keyword
hints are deterministic, no embeddings needed): a "...tell me..." web query no
longer pulls in email tools, a real email request still does.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
lekt8
2026-06-03 12:33:43 +08:00
committed by GitHub
parent a8a5d6f56e
commit 77614e9feb
2 changed files with 62 additions and 1 deletions

View File

@@ -293,7 +293,11 @@ class ToolIndex:
# Keyword hints: if the query mentions these words, force-include the tools. # Keyword hints: if the query mentions these words, force-include the tools.
_KEYWORD_HINTS = { _KEYWORD_HINTS = {
frozenset({"email", "mail", "gmail", "googlemail", "message", "send", "reply", "inbox", "unread", "tell"}): # NOTE: "tell" was removed from this set. It fired on any "tell me ..."
# request (e.g. "visit <url> and tell me the title"), force-including the
# whole email toolset and crowding out the relevant tools — the model then
# believed it had only email tools and refused web/other tasks (#1707).
frozenset({"email", "mail", "gmail", "googlemail", "message", "send", "reply", "inbox", "unread"}):
{"list_email_accounts", "list_emails", "read_email", "send_email", "reply_to_email", "bulk_email", "delete_email", "archive_email", "mark_email_read", "resolve_contact", "ui_control"}, {"list_email_accounts", "list_emails", "read_email", "send_email", "reply_to_email", "bulk_email", "delete_email", "archive_email", "mark_email_read", "resolve_contact", "ui_control"},
frozenset({"calendar", "event", "meeting", "schedule", "appointment"}): frozenset({"calendar", "event", "meeting", "schedule", "appointment"}):
{"manage_calendar"}, {"manage_calendar"},

View File

@@ -0,0 +1,57 @@
"""Regression for issue #1707 — the agent tool-RAG force-included the entire
email toolset on any "tell me ..." query, crowding out the relevant tools so the
model believed it only had email tools and refused web/other tasks.
Root cause: `_KEYWORD_HINTS` in src/tool_index.py listed "tell" under the email
intent, and `get_tools_for_query` force-includes a hint's tools whenever any of
its keywords appears (word-boundary match). "tell" appears in a huge fraction of
requests (the reporter's was "visit <url> and tell me the title"), so email tools
were force-included for non-email queries.
These hints are deterministic string matching — no embeddings — so we can test
`get_tools_for_query` directly with retrieval stubbed out (no ChromaDB needed).
"""
from src.tool_index import ToolIndex, ALWAYS_AVAILABLE
_EMAIL_TOOLS = {
"list_emails", "read_email", "send_email", "reply_to_email",
"bulk_email", "delete_email", "archive_email", "mark_email_read",
}
def _index_without_embeddings():
"""A ToolIndex whose retrieval returns nothing, so get_tools_for_query
exercises only the deterministic base + keyword-hint logic."""
ti = ToolIndex.__new__(ToolIndex) # skip __init__ (no ChromaDB/fastembed)
ti.retrieve = lambda query, k=8: []
return ti
def test_tell_in_web_query_does_not_force_email_tools():
"""The #1707 repro: a web request that merely contains the word 'tell' must
NOT drag in the email toolset."""
ti = _index_without_embeddings()
q = "visit https://www.youtube.com/user/PewDiePie and tell me the title of his latest video"
tools = ti.get_tools_for_query(q)
leaked = _EMAIL_TOOLS & tools
assert not leaked, f"'tell me' must not force-include email tools, got {sorted(leaked)}"
# web_search / web_fetch are always-available and must remain present.
assert "web_search" in tools and "web_fetch" in tools
def test_genuine_email_query_still_gets_email_tools():
"""Removing 'tell' must not break real email intent — the actual email
keywords still force-include the toolset."""
ti = _index_without_embeddings()
tools = ti.get_tools_for_query("reply to the unread email in my inbox")
assert {"reply_to_email", "send_email", "read_email"} <= tools
def test_plain_tell_request_stays_minimal():
"""A bare 'tell me a joke' must not pull in email tools either."""
ti = _index_without_embeddings()
tools = ti.get_tools_for_query("tell me a joke")
assert not (_EMAIL_TOOLS & tools)
# Always-available baseline is still there.
assert set(ALWAYS_AVAILABLE) <= tools