Files
odysseus/tests/test_strip_think.py
Tatlatat dac64f20d9 Text: strip dangling think blocks after visible text
`strip_think` removes a dangling (unclosed) `<think>` block via
`_THINK_OPEN_RE`, but that pattern was anchored to the start of the string
(`^\s*<think>`). An unclosed `<think>` (or `<thinking>`) opener that
appears *after* any leading output was therefore only half-handled: the
stray tag itself was removed by `_THINK_TAG_RE`, but the reasoning content
following it leaked straight to the user.

  strip_think("Hello! <think> I am thinking.")       # -> "Hello! I am thinking."  (leak)
  strip_think("Sure.\n<think>\nLet me reconsider...") # -> leaks the reasoning

`strip_think` feeds user-facing output across research, email replies,
notes, and scheduled tasks, so this leaks chain-of-thought to end users.

Un-anchor `_THINK_OPEN_RE` so a dangling opener anywhere strips from the
opener to end of string, consistent with the existing start-of-string
behavior. Content before the opener, closed `<think>...</think>` blocks,
and tag-free text are all preserved.

tests/test_strip_think.py covers the mid-text leak (fails before this
change), start-anchored unclosed, closed blocks, no-tag passthrough,
content-before-opener, and mixed closed+unclosed. Full existing think
suite still passes.
2026-06-02 20:36:37 +09:00

26 lines
1.1 KiB
Python

import pytest
from src.text_helpers import strip_think
def test_strip_think_cases():
# 1. Mid-text unclosed leak (fails before fix)
assert strip_think("Hello! <think> I am thinking.") == "Hello!"
assert strip_think("Sure.\n<think>\nLet me reconsider...") == "Sure."
assert strip_think("Sure.\n<thinking>\nLet me reconsider...") == "Sure."
# 2. Start-anchored unclosed
assert strip_think("<think> unclosed from start") == ""
assert strip_think(" <thinking> thinking at start") == ""
# 3. Closed block
assert strip_think("Hello! <think> closed </think> Here is the answer.") == "Hello! Here is the answer."
assert strip_think("Hello! <thinking> closed </thinking> Here is the answer.") == "Hello! Here is the answer."
# 4. No-tag passthrough
assert strip_think("No tags here.") == "No tags here."
# 5. Content-before-opener preserved (part of mid-text unclosed)
assert strip_think("Prefix text <think> trailing thoughts") == "Prefix text"
# 6. Multiple blocks (closed + unclosed)
assert strip_think("Hello! <think> closed </think> Here is the answer. <think> unclosed") == "Hello! Here is the answer."