Files
odysseus/tests/test_search_ranking_recency.py
lekt8 975fd42e32 fix: rank recency by UTC, not local time (#1116) (#1234)
src/search/ranking.py computed result age as `(datetime.now() - dt).days`, where
`dt` is parsed from a UTC-style published date with no timezone. Using local
`datetime.now()` skewed the age by the host's UTC offset (off-by-up-to-a-day near
boundaries), and was a latent crash: once neighbouring code becomes timezone-aware
the naive/aware subtraction raises TypeError (the landmine called out in #1116).

Recency is now measured against naive UTC. The scoring is also lifted out of the
rank_search_results closure into a module-level, time-injectable `recency_score`
so it's unit-testable, and `_utcnow_naive()` avoids `datetime.utcnow()` (removed in
Python 3.14).

Covered by tests/test_search_ranking_recency.py (5 cases); the existing
tests/test_search_ranking.py still passes.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 00:18:15 +09:00

40 lines
1.4 KiB
Python

"""Issue #1116 (latent ranking bug) — recency scoring uses UTC, not local time.
`recency_score` measured age with `datetime.now()` (local) against UTC-style
published dates, skewing the age by the host's UTC offset and risking a TypeError
once neighbouring code becomes timezone-aware. It now uses naive UTC and is a
module-level, time-injectable function.
"""
from datetime import datetime, timezone
from src.search.ranking import recency_score, _utcnow_naive
def test_fresh_result_scores_one():
assert recency_score("2026-01-01", now=datetime(2026, 1, 5)) == 1.0 # 4 days old
def test_old_result_scores_zero():
assert recency_score("2026-01-01", now=datetime(2026, 3, 1)) == 0.0 # >30 days
def test_mid_range_decays_linearly():
score = recency_score("2026-01-01", now=datetime(2026, 1, 20)) # 19 days old
assert score == (30 - 19) / 23
def test_empty_or_unparseable_scores_zero():
assert recency_score("", now=datetime(2026, 1, 1)) == 0.0
assert recency_score(None, now=datetime(2026, 1, 1)) == 0.0
assert recency_score("not-a-date", now=datetime(2026, 1, 1)) == 0.0
def test_default_now_is_naive_utc():
# Naive (no tzinfo) so it subtracts cleanly from the naive parsed dates,
# and UTC-based (3.14-safe, no datetime.utcnow()).
now = _utcnow_naive()
assert now.tzinfo is None
reference = datetime.now(timezone.utc).replace(tzinfo=None)
assert abs((now - reference).total_seconds()) < 5