Text: strip dangling think blocks after visible text

`strip_think` removes a dangling (unclosed) `<think>` block via
`_THINK_OPEN_RE`, but that pattern was anchored to the start of the string
(`^\s*<think>`). An unclosed `<think>` (or `<thinking>`) opener that
appears *after* any leading output was therefore only half-handled: the
stray tag itself was removed by `_THINK_TAG_RE`, but the reasoning content
following it leaked straight to the user.

  strip_think("Hello! <think> I am thinking.")       # -> "Hello! I am thinking."  (leak)
  strip_think("Sure.\n<think>\nLet me reconsider...") # -> leaks the reasoning

`strip_think` feeds user-facing output across research, email replies,
notes, and scheduled tasks, so this leaks chain-of-thought to end users.

Un-anchor `_THINK_OPEN_RE` so a dangling opener anywhere strips from the
opener to end of string, consistent with the existing start-of-string
behavior. Content before the opener, closed `<think>...</think>` blocks,
and tag-free text are all preserved.

tests/test_strip_think.py covers the mid-text leak (fails before this
change), start-anchored unclosed, closed blocks, no-tag passthrough,
content-before-opener, and mixed closed+unclosed. Full existing think
suite still passes.
This commit is contained in:
Tatlatat
2026-06-02 18:36:37 +07:00
committed by GitHub
parent 8ad436d25a
commit dac64f20d9
2 changed files with 28 additions and 3 deletions

View File

@@ -20,9 +20,9 @@ import re
_THINK_CLOSED_RE = re.compile(r"<think(?:ing)?>[\s\S]*?</think(?:ing)?>\s*", re.IGNORECASE)
# Orphan opening or closing tags that survive after the closed-pass.
_THINK_TAG_RE = re.compile(r"</?think(?:ing)?[^>]*>\s*", re.IGNORECASE)
# Dangling opener at the top of the response with no closer — strip everything
# from `<think>` up to either `</think>` (if it ever shows) or end of string.
_THINK_OPEN_RE = re.compile(r"^\s*<think(?:ing)?>.*?(?:</think(?:ing)?>|$)", re.DOTALL | re.IGNORECASE)
# Dangling opener anywhere in the response with no closer — strip everything
# from `<think>` to the end of string.
_THINK_OPEN_RE = re.compile(r"<think(?:ing)?>[\s\S]*$", re.IGNORECASE)
# Streaming models occasionally emit `<thinking time="0.42">`-style attributes.
# Normalize to a plain `<think>` so the regexes above catch them.
_THINK_ATTR_RE = re.compile(r"<think(?:ing)?\s+[^>]*>", re.IGNORECASE)