Files
odysseus/tests/test_search_provider_json.py
Shaw 552bc15067 fix(search): degrade to empty results on non-JSON provider responses (#1129) (#1352)
tavily_search, serper_search and google_pse_search parsed response.json()
inside the network try block, which only caught httpx.RequestError and
RateLimitError. When a provider returned a non-JSON body (an HTML error page, a
truncated/empty body, a gateway 5xx), response.json() raised an UNCAUGHT
json.JSONDecodeError that aborted the search in the background — exactly the
'search engines other than SearXNG fail in the background' symptom.

brave_search already handles this correctly: it parses JSON in its own try
block and returns [] on json.JSONDecodeError. Mirror that in the other three
providers so a malformed provider response degrades to no-results instead of
propagating an exception.

Adds tests/test_search_provider_json.py: a non-JSON 200 body now yields [] for
tavily, serper, google_pse, and brave (the last guards the reference behaviour).

Co-authored-by: NubsCarson <nubs@nubs.site>
2026-06-03 14:24:23 +09:00

60 lines
2.2 KiB
Python

"""Search providers must not raise on a non-JSON response body (issue #1129).
`brave_search` already wraps `response.json()` in its own try/except that catches
`json.JSONDecodeError` and returns []. The Tavily, Serper, and Google PSE
providers parsed JSON inside the network try block, which only caught
`httpx.RequestError`/`RateLimitError` — so a provider returning a non-JSON body
(an HTML error page, a truncated/empty body, a gateway error) raised an
UNCAUGHT `json.JSONDecodeError` that aborted the search in the background. These
pin that all four providers degrade to [] on malformed JSON, matching brave.
"""
import json
import pytest
from services.search import providers
class _BadJSONResponse:
"""A 200 response whose body is not valid JSON (e.g. an HTML error page)."""
status_code = 200
def raise_for_status(self):
return None
def json(self):
raise json.JSONDecodeError("Expecting value", "<html>down</html>", 0)
@pytest.fixture(autouse=True)
def _offline(monkeypatch):
# Keep everything offline + deterministic: no settings/DB, keys via env, and
# both httpx verbs return a body that fails to decode.
monkeypatch.setattr(providers, "_get_search_settings", lambda: {}, raising=False)
monkeypatch.setattr(providers, "_safesearch_for", lambda *_a, **_k: None, raising=False)
monkeypatch.setenv("DATA_BRAVE_API_KEY", "k")
monkeypatch.setenv("TAVILY_API_KEY", "k")
monkeypatch.setenv("SERPER_API_KEY", "k")
monkeypatch.setenv("GOOGLE_API_KEY", "k")
monkeypatch.setenv("GOOGLE_PSE_CX", "cx")
monkeypatch.setattr(providers.httpx, "post", lambda *a, **k: _BadJSONResponse())
monkeypatch.setattr(providers.httpx, "get", lambda *a, **k: _BadJSONResponse())
def test_tavily_malformed_json_returns_empty():
assert providers.tavily_search("hello") == []
def test_serper_malformed_json_returns_empty():
assert providers.serper_search("hello") == []
def test_google_pse_malformed_json_returns_empty():
assert providers.google_pse_search("hello") == []
def test_brave_malformed_json_returns_empty():
# Already correct on main — guards against regressing the reference behaviour.
assert providers.brave_search("hello") == []