The Ollama platform has recently announced its support for Llama 3.2 vision, a multimodal LLM that can recognize, reason, and caption images. This new integration allows users to run Llama 3.2 vision locally, in both 11B and 90B sizes. The announcement was made through a blog post, which also showcased several examples of using Llama 3.2 vision in various tasks such as OCR, image recognition, Q&A, data visual analysis, and handwriting.
One of the major updates in this release is the integration of Llama 3.2 vision with OpenWeb UI, instead of Llama.cpp. This allows for a smoother and more user-friendly experience. Additionally, Ollama now runs locally on the user’s system, eliminating any privacy concerns that may arise when uploading images to the tool. The team has also provided detailed instructions on how to integrate and use Llama 3.2 vision within Ollama. This feature was released in version 0.4 of Ollama.
Another notable update in this version is the speed enhancement for follow-on requests to vision models. This allows for faster and more efficient processing of tasks. Additionally, Ollama now has the capability to import models from Safetensors without a Modelfile, making it even more convenient for users.
The release of Llama 3.2 by Meta in September this year was a major milestone in the field of vision-based tasks. Meta claims that Llama 3.2 outperforms other models such as Claude 3 Haiku and GPT 4o. They are offering both small and medium vision LLMs (11B and 90B) as well as lightweight models (1B and 3B) optimized for on-device use, with support for both Qualcomm and MediaTek hardware.
Ollama has been a pioneer in providing a free infrastructure for running LLMs locally on a computer. Earlier this year, when AIM reviewed the best tools for running large language models, Ollama stood out as the most efficient solution, offering unmatched flexibility. However, a recent report by Oligo’s research team revealed six critical flaws in Ollama, four of which were considered as CVEs (common vulnerability exposures) and were promptly patched in an update. The other two flaws were disputed by the maintainers of Ollama. These vulnerabilities could potentially allow an attacker to carry out malicious actions such as DoS attacks, model poisoning, and model theft with a single HTTP request.
One of the main advantages of using Ollama is the ability to run LLMs locally, minimizing the risks associated with using them online. However, if more such vulnerabilities are discovered in the future, it could significantly impact Ollama’s popularity and position in the AI ecosystem.