fix: surface reasoning_content when content is empty (thinking models) (#1233)
Thinking models served via llama.cpp without --reasoning-format none
(e.g. Qwen3, DeepSeek-R1) route all tokens into reasoning_content and
return content="". Two call paths were silently broken:
- llm_call / llm_call_async (non-streaming): hard-keyed
data["choices"][0]["message"]["content"] raises KeyError or returns
empty string, discarding the entire response.
- stream_agent_loop end-of-round fallback: when full_response is empty
but round_reasoning has content, the existing code replaced the
response with the generic empty-response error message, discarding
all reasoning tokens that were correctly accumulated during streaming.
Fix: in both non-streaming paths use msg.get("content") or
msg.get("reasoning_content") or "". In the streaming fallback, surface
round_reasoning as the answer before falling through to the error path.
This commit is contained in:
@@ -860,7 +860,8 @@ def llm_call(url: str, model: str, messages: List[Dict], temperature: float = LL
|
||||
elif provider == "ollama":
|
||||
response = _parse_ollama_response(data)
|
||||
else:
|
||||
response = data["choices"][0]["message"]["content"]
|
||||
msg = data["choices"][0]["message"]
|
||||
response = msg.get("content") or msg.get("reasoning_content") or ""
|
||||
_set_cached_response(cache_key, response)
|
||||
return response
|
||||
except Exception:
|
||||
@@ -997,7 +998,8 @@ async def llm_call_async(
|
||||
elif provider == "ollama":
|
||||
response = _parse_ollama_response(data)
|
||||
else:
|
||||
response = data["choices"][0]["message"]["content"]
|
||||
msg = data["choices"][0]["message"]
|
||||
response = msg.get("content") or msg.get("reasoning_content") or ""
|
||||
_set_cached_response(cache_key, response)
|
||||
return response
|
||||
except Exception:
|
||||
|
||||
Reference in New Issue
Block a user