Commit Graph

2 Commits

Author SHA1 Message Date
lekt8
583df3dd6a Recognize gemma3/llama4/mistral-small3.1+/multimodal as vision models (#1430)
is_vision_model() classified several genuinely multimodal families as text-only
because their names contain neither "vision" nor "vl": Gemma 3 (4b+), Llama 4,
Mistral Small 3.1/3.2, and *-multimodal models (e.g. phi-4-multimodal). For those
the attached image was stripped before the request, so the model never saw it —
a "can't read the image" report (issue #1274), common with Ollama tags like
gemma3:4b.

Add those keywords (plus a generic "multimodal"). Per the file's err-toward-True
policy (#124), a rare text-only tag treated as vision is the safer failure than
dropping a real image. Guard tests confirm the text-only siblings (gemma2, plain
gemma, mistral-small, phi-3) are not over-matched.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 04:17:40 +09:00
Håkon Julius Størholt
91d3511580 Recognize local vision models so their images aren't dropped (#185)
An image attachment only got through if the model name was on a short
built-in list. Anything else was treated as text-only and the image was
quietly dropped, so the model never saw it. That left out a lot of the
smaller vision models you can run locally (moondream was the one I hit).

Pulled the check into is_vision_model() in chat_helpers, broadened it to
cover those, and added a test. Models that already worked are unaffected.

Fixes #124.
2026-06-01 13:09:21 +09:00