The speed of inference has become a major focus for companies as they strive to optimize and develop their own AI models. Discussions around test-time compute have also intensified, with models like OpenAI’s o1 showcasing advanced thinking and reasoning abilities post-prompt, relying on powerful infrastructure for computation even after training. This has led to a rise in companies like Groq, Sambanova, and Cerebras Systems, who are building their own hardware and offering unmatched performance in inference, competing with established players like NVIDIA and AMD.
However, Simplismart, a Bengaluru-based startup founded by former Oracle and Google engineers, has emerged as a leader in creating high-performance AI deployment tools. Unlike its competitors, Simplismart focuses on software-based solutions for inference speed, rather than hardware. Its inference engine is designed to optimize performance for all model deployments, enabling models like Llama 3.1 8B to achieve a throughput of over 343 tokens per second, making it the fastest in the world. This puts Simplismart ahead of hardware companies like Groq, Cerebras, and SambaNova.
The platform also supports a wide range of models, including Whisper V3, Mistral 7B, Melo TTS, and SDXL. Unlike Groq and others, which rely on hardware or cloud-based solutions, Simplismart’s innovation lies in its MLOps platform, which is designed for on-premises enterprise deployments and is flexible across different models and cloud platforms.
In an interview with Simplismart’s co-founder and CEO, Amritanshu Jain, he clarified that the company is not looking to enter the hardware race. “Companies like Grok and Cerebras may market their hardware as the fastest in inference, but that’s a battle we don’t want to fight. Hardware is a race to the bottom, where companies constantly outdo each other with new chips. Instead, we’re building a universal engine that’s model-agnostic, chip-agnostic, and cloud-agnostic,” Jain said.
Simplismart’s platform offers a declarative language, similar to Terraform, which simplifies the process of fine-tuning, deploying, and monitoring AI models at scale. This language allows enterprises to fine-tune, deploy, and monitor models with ease, providing a flexible solution adaptable to both on-premises and cloud-based environments.
Founded by Jain and Devansh Ghatak, this Bangalore-based startup has gained attention for claiming to have developed the fastest inference engine in the world, surpassing competitors like TogetherAI and FireworksAI. In October, Simplismart raised $7 million in a Series A funding round led by Accel, with participation from Shastra VC, Titan Capital, and angel investor Akshay Kothari, the co-founder of Notion.
“Our goal is not just to be the fastest, but to provide enterprises with the autonomy they need to make AI work for them on their terms,” said Jain. The company’s platform supports NVIDIA GPUs and AMD chips and can integrate with specialized accelerators if they become publicly available. This adaptability and focus on software-based solutions have positioned Simplismart as a leader in the AI deployment space, with a promising future ahead.