Provider detection: match by hostname instead of substring (re #768) (#815)

* Dedupe URL routing helpers and tighten adjacent hostname checks

* Match providers by hostname, not substring, in _detect_provider

_detect_provider used `"anthropic.com" in url`-style substring checks, so a URL
that merely contained a provider's domain in its path or query — or a look-alike
host like `anthropic.com.example` — was misclassified and picked the wrong
auth-header/payload shape. Switch it to the existing `_host_match` helper
(hostname exact/subdomain match), the same way the human-readable labels and
curated model lists already work, finishing that migration. Also harden
`_host_match` against trailing-dot FQDNs.

Not a credential-leak fix: _detect_provider only classifies a URL the admin
already configured next to its key, and the URL — not this function — decides
where the request goes. This is a correctness/consistency cleanup.

Adds tests that import the real helpers (test_endpoint_resolver.py tests local
copies, so it can't catch this) covering the substring false-positives.

Refs #768.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Import build_headers under its real name in model_routes

It was imported as `build_headers as _provider_headers`, which collides with
the unrelated llm_core._provider_headers(provider, headers) — same name,
different signature. Use the real name to remove the confusion.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* Use hostname matching in URL builders, not raw suffix checks

PR review flagged that _detect_provider() was hardened to match on
hostname, but several helpers still used raw host.endswith("anthropic.com")
/ host.endswith("ollama.com"), which match adjacent hosts like
notanthropic.com / notollama.com.

Route the remaining checks through _host_match(): _is_ollama_native_url
and _ollama_api_root in llm_core, and _anthropic_api_root / _ollama_api_root
in endpoint_resolver. With _detect_provider already hostname-correct, the
trailing "or host.endswith(...)" clauses in build_chat_url / build_models_url
are redundant, so drop them rather than fix the substring match in place.

Add builder-level tests asserting look-alike and domain-in-path hosts route
to the OpenAI-compatible default. They import the real builders and fail on
the pre-fix code.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
LittleLlama
2026-06-01 19:11:17 -07:00
committed by GitHub
parent 3f6d630b56
commit 54ecfa39cf
4 changed files with 245 additions and 128 deletions

View File

@@ -163,7 +163,7 @@ def _is_ollama_native_url(url: str) -> bool:
return False
host = parsed.hostname or ""
path = (parsed.path or "").rstrip("/")
if host.endswith("ollama.com"):
if _host_match(url, "ollama.com"):
return True
local_ollama_host = host in {"localhost", "127.0.0.1", "0.0.0.0", "::1"} or parsed.port == 11434
return local_ollama_host and (path == "/api" or path.startswith("/api/"))
@@ -173,7 +173,6 @@ def _ollama_api_root(url: str) -> str:
"""Return a native Ollama API root such as https://ollama.com/api."""
url = (url or "").strip().rstrip("/")
parsed = urlparse(url)
host = parsed.hostname or ""
path = (parsed.path or "").rstrip("/")
if path.endswith("/api/chat"):
return url[: -len("/chat")]
@@ -183,7 +182,7 @@ def _ollama_api_root(url: str) -> str:
return url[: -len("/generate")]
if path.endswith("/api"):
return url
if host.endswith("ollama.com"):
if _host_match(url, "ollama.com"):
root = f"{parsed.scheme}://{parsed.netloc}" if parsed.scheme and parsed.netloc else "https://ollama.com"
return root.rstrip("/") + "/api"
return url
@@ -225,16 +224,43 @@ def _parse_ollama_response(data: dict) -> str:
return message.get("content") or data.get("response") or ""
def _host_match(url: str, *domains: str) -> bool:
"""Return True if url's hostname equals any of `domains` or is a subdomain of one.
Used by helpers that want "is this Anthropic?" / "is this OpenRouter?"
style checks. Prefer this over substring matching on the URL: the
substring form gives wrong answers for unrelated paths or query strings
that happen to contain the domain text.
"""
if not url:
return False
try:
# rstrip(".") so a fully-qualified host with a trailing dot
# ("api.anthropic.com.") still matches "anthropic.com".
host = (urlparse(url).hostname or "").lower().rstrip(".")
except Exception:
return False
if not host:
return False
return any(host == d or host.endswith("." + d) for d in domains)
def _detect_provider(url: str) -> str:
"""Detect API provider from URL."""
u = (url or "").lower()
"""Detect the API provider from a configured endpoint URL.
Matches on hostname (exact or subdomain) rather than substring, so a URL
that merely contains a provider's domain in its path or query — or a
look-alike host such as ``anthropic.com.example`` — is not misclassified.
Unknown hosts fall back to the OpenAI-compatible default, which the
majority of providers implement.
"""
if _is_ollama_native_url(url):
return "ollama"
if "anthropic.com" in u:
if _host_match(url, "anthropic.com"):
return "anthropic"
if "openrouter.ai" in u:
if _host_match(url, "openrouter.ai"):
return "openrouter"
if "groq.com" in u:
if _host_match(url, "groq.com"):
return "groq"
return "openai"
@@ -251,26 +277,27 @@ def _provider_headers(provider: str, headers: Optional[Dict] = None) -> Dict[str
def _provider_label(url: str) -> str:
"""Human-friendly provider name for error messages."""
u = (url or "").lower()
if "anthropic.com" in u: return "Anthropic"
if "ollama.com" in u: return "Ollama Cloud"
if "api.x.ai" in u or "x.ai/" in u: return "xAI"
if "openai.com" in u: return "OpenAI"
if "openrouter.ai" in u: return "OpenRouter"
if "groq.com" in u: return "Groq"
if "mistral.ai" in u: return "Mistral"
if "deepseek.com" in u: return "DeepSeek"
if "googleapis.com" in u or "generativelanguage" in u: return "Google"
if "together.xyz" in u or "together.ai" in u: return "Together"
if "fireworks.ai" in u: return "Fireworks"
if "ollama" in u or ":11434" in u: return "Ollama"
if "localhost" in u or "127.0.0.1" in u: return "local endpoint"
if not url:
return "provider"
if _host_match(url, "anthropic.com"): return "Anthropic"
if _host_match(url, "ollama.com"): return "Ollama Cloud"
if _host_match(url, "x.ai"): return "xAI"
if _host_match(url, "openai.com"): return "OpenAI"
if _host_match(url, "openrouter.ai"): return "OpenRouter"
if _host_match(url, "groq.com"): return "Groq"
if _host_match(url, "mistral.ai"): return "Mistral"
if _host_match(url, "deepseek.com"): return "DeepSeek"
if _host_match(url, "googleapis.com"): return "Google"
if _host_match(url, "together.xyz", "together.ai"): return "Together"
if _host_match(url, "fireworks.ai"): return "Fireworks"
if _is_ollama_native_url(url): return "Ollama"
try:
from urllib.parse import urlparse
host = urlparse(url).hostname or "provider"
return host
host = (urlparse(url).hostname or "").lower()
except Exception:
return "provider"
if host in {"localhost", "127.0.0.1", "::1", "0.0.0.0"}:
return "local endpoint"
return host or "provider"
def _format_upstream_error(status: int, body: bytes | str, url: str) -> str: