POST /api/embeddings/endpoint takes a user-supplied URL and immediately
makes an outbound httpx request to it with no validation. The admin gate
added earlier (PR #80) closed the unauthenticated-access part of #132; this
addresses the remaining request: validate the URL before fetching it.
Odysseus is local-first, so pointing the embedding endpoint at a loopback or
LAN server (local vLLM / llama.cpp / Ollama) is a normal setup — a blanket
private-IP block would break the primary use case. So the guard:
- always rejects non-HTTP(S) schemes (file://, gopher://, ftp:// …),
- always rejects the link-local range (169.254.0.0/16, incl. the cloud
instance-metadata 169.254.169.254 exfil vector) plus multicast /
reserved / unspecified, and IPv4-mapped-IPv6 forms of the above,
- keeps loopback/LAN allowed by default, and
- adds EMBEDDING_BLOCK_PRIVATE_IPS=true for full SSRF lockdown on exposed
multi-tenant deployments.
Logic lives in src/url_safety.py (stdlib only, resolver injectable) so it is
unit-testable without real DNS; the route calls it before the health-check
request. Covered by tests/test_url_safety.py (8 cases).
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>