IBM has recently launched Granite 3.0, the latest generation of its large language models (LLMs) designed specifically for enterprise applications. This collection includes several models, with the Granite 3.0 8B Instruct model being the highlight. It has been trained on over 12 trillion tokens across multiple languages, making it a powerful tool for businesses. IBM claims that this model rivals similarly sized models from Meta and Mistral AI on academic benchmarks included in Hugging Face’s OpenLLM Leaderboard v2. The company also offers fine-tuning options through InstructLab, allowing organizations to customize the models to their specific needs and potentially reduce costs.
All Granite models are released under the Apache 2.0 license, with detailed disclosures of training datasets and methodologies included in the accompanying technical paper. The Granite 3.0 release includes various models such as General Purpose LLMs, Guardrail Models, Mixture of Experts (MoE) Models, and a Speculative Decoder. These models cater to different use cases, including text generation, classification, summarization, entity extraction, and customer service chatbots. They also support programming tasks like code generation, explanation, and editing, as well as agentic use cases that require tool calling.
IBM plans to release updates in 2024 that will increase the model context windows to 128K tokens and introduce multimodal capabilities. The Granite 3.0 models are available for commercial use on the IBM watsonx platform and through partners like Google Cloud, Hugging Face, and NVIDIA. The company emphasizes safety and transparency in AI, with Granite 3.0 models incorporating robust safety features and extensive training data filtering to mitigate risks. The Granite Guardian models enhance input and output management across various dimensions, outperforming existing models in key safety benchmarks.
IBM’s new models leverage innovative training techniques, including the use of the Data Prep Kit for efficient data processing and a power scheduler for optimized learning rates. This enables faster convergence to optimal model weights while minimizing training costs. Additionally, the Granite 3.0 language models were trained on Blue Vela, powered entirely by renewable energy, reinforcing IBM’s commitment to sustainability in AI development. With the launch of Granite 3.0, IBM continues to push the boundaries of language models and provide businesses with powerful tools to enhance their operations.