* fix: omit temperature for OpenAI reasoning models (o1/o3/o4/gpt-5) These models only accept the default temperature; sending any explicit value (even 0.0) returns HTTP 400 "Only the default (1) value is supported". This broke two paths: - Endpoint probing in _probe_single_model hardcodes temperature: 0.0, so a perfectly valid o3/gpt-5 endpoint is reported as failing in the Model Endpoints health check. - Chat/stream payloads send temperature unconditionally, so a non-default temperature preset 400s on these models. The code already special-cases the same model family for max_completion_tokens, so this adds a sibling _restricts_temperature() helper and omits the field for those models, letting the API use its required default. gpt-4.5 is intentionally excluded (not a reasoning model; accepts temperature normally). Adds tests/test_llm_core_temperature.py covering the predicate and the synchronous payload builder. * fix: also omit temperature for reasoning models on the direct-POST paths The first commit only covered llm_call/llm_call_async/stream_llm and the endpoint probe. Email auto-summary, urgency-less spam classification, the email reply-summary endpoint, and gallery vision tagging build their OpenAI payloads inline and POST them directly (requests/httpx), bypassing llm_core — so a reasoning model configured there would still 400 on the temperature field. These sites already branch on _uses_max_completion_tokens, so they're the same class; added the matching _restricts_temperature guard. gallery_routes also gains the max_completion_tokens branch it was missing, so gpt-5 vision tagging works end to end. Note: email_pollers urgency scoring goes through llm_call_async and was already covered.
69 lines
2.3 KiB
Python
69 lines
2.3 KiB
Python
"""Regression tests: OpenAI reasoning models reject a non-default temperature.
|
|
|
|
o1/o3/o4/gpt-5 only accept the default temperature (1); sending an explicit
|
|
value — even 0.0 — returns HTTP 400 "Only the default (1) value is supported".
|
|
The OpenAI-compatible payload builders must omit the temperature field for these
|
|
models so chat (with a non-default preset) and endpoint probing don't break.
|
|
"""
|
|
import httpx
|
|
import pytest
|
|
|
|
from src import llm_core
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"model",
|
|
["o1", "o1-mini", "o3", "o3-mini", "o4-mini", "gpt-5", "gpt-5-mini",
|
|
"openrouter/openai/o3-mini", "OpenAI/GPT-5"],
|
|
)
|
|
def test_reasoning_models_restrict_temperature(model):
|
|
assert llm_core._restricts_temperature(model) is True
|
|
|
|
|
|
@pytest.mark.parametrize(
|
|
"model",
|
|
["gpt-4o", "gpt-4.1", "gpt-3.5-turbo", "gpt-4.5-preview",
|
|
"claude-3-5-sonnet", "llama3.1", "", None],
|
|
)
|
|
def test_normal_models_allow_temperature(model):
|
|
assert llm_core._restricts_temperature(model) is False
|
|
|
|
|
|
def _capture_openai_payload(monkeypatch, model, temperature):
|
|
"""Run a synchronous OpenAI-compatible call and return the posted JSON body."""
|
|
llm_core._response_cache.clear()
|
|
seen = {}
|
|
|
|
def fake_post(url, headers=None, json=None, timeout=None):
|
|
seen["json"] = json
|
|
request = httpx.Request("POST", url)
|
|
return httpx.Response(
|
|
200,
|
|
request=request,
|
|
json={"choices": [{"message": {"content": "OK"}}]},
|
|
)
|
|
|
|
monkeypatch.setattr(llm_core.httpx, "post", fake_post)
|
|
result = llm_core.llm_call(
|
|
"https://api.openai.com/v1/chat/completions",
|
|
model,
|
|
[{"role": "user", "content": "Say OK"}],
|
|
temperature=temperature,
|
|
max_tokens=5,
|
|
)
|
|
assert result == "OK"
|
|
return seen["json"]
|
|
|
|
|
|
def test_reasoning_model_payload_omits_temperature(monkeypatch):
|
|
payload = _capture_openai_payload(monkeypatch, "o3-mini", 0.0)
|
|
assert "temperature" not in payload
|
|
# Reasoning models also use max_completion_tokens, which must survive.
|
|
assert payload["max_completion_tokens"] == 5
|
|
|
|
|
|
def test_normal_model_payload_keeps_temperature(monkeypatch):
|
|
payload = _capture_openai_payload(monkeypatch, "gpt-4o", 0.2)
|
|
assert payload["temperature"] == 0.2
|
|
assert payload["max_tokens"] == 5
|