`strip_think` removes a dangling (unclosed) `<think>` block via
`_THINK_OPEN_RE`, but that pattern was anchored to the start of the string
(`^\s*<think>`). An unclosed `<think>` (or `<thinking>`) opener that
appears *after* any leading output was therefore only half-handled: the
stray tag itself was removed by `_THINK_TAG_RE`, but the reasoning content
following it leaked straight to the user.
strip_think("Hello! <think> I am thinking.") # -> "Hello! I am thinking." (leak)
strip_think("Sure.\n<think>\nLet me reconsider...") # -> leaks the reasoning
`strip_think` feeds user-facing output across research, email replies,
notes, and scheduled tasks, so this leaks chain-of-thought to end users.
Un-anchor `_THINK_OPEN_RE` so a dangling opener anywhere strips from the
opener to end of string, consistent with the existing start-of-string
behavior. Content before the opener, closed `<think>...</think>` blocks,
and tag-free text are all preserved.
tests/test_strip_think.py covers the mid-text leak (fails before this
change), start-anchored unclosed, closed blocks, no-tag passthrough,
content-before-opener, and mixed closed+unclosed. Full existing think
suite still passes.
26 lines
1.1 KiB
Python
26 lines
1.1 KiB
Python
import pytest
|
|
from src.text_helpers import strip_think
|
|
|
|
def test_strip_think_cases():
|
|
# 1. Mid-text unclosed leak (fails before fix)
|
|
assert strip_think("Hello! <think> I am thinking.") == "Hello!"
|
|
assert strip_think("Sure.\n<think>\nLet me reconsider...") == "Sure."
|
|
assert strip_think("Sure.\n<thinking>\nLet me reconsider...") == "Sure."
|
|
|
|
# 2. Start-anchored unclosed
|
|
assert strip_think("<think> unclosed from start") == ""
|
|
assert strip_think(" <thinking> thinking at start") == ""
|
|
|
|
# 3. Closed block
|
|
assert strip_think("Hello! <think> closed </think> Here is the answer.") == "Hello! Here is the answer."
|
|
assert strip_think("Hello! <thinking> closed </thinking> Here is the answer.") == "Hello! Here is the answer."
|
|
|
|
# 4. No-tag passthrough
|
|
assert strip_think("No tags here.") == "No tags here."
|
|
|
|
# 5. Content-before-opener preserved (part of mid-text unclosed)
|
|
assert strip_think("Prefix text <think> trailing thoughts") == "Prefix text"
|
|
|
|
# 6. Multiple blocks (closed + unclosed)
|
|
assert strip_think("Hello! <think> closed </think> Here is the answer. <think> unclosed") == "Hello! Here is the answer."
|