Cerebras, the AI hardware and inference solution provider, has introduced a new technique called CePO – Cerebras Planning and Optimization. This technique is designed to significantly enhance the reasoning capabilities of Meta’s Llama models. By using the highly sought-after test time computation technique on the Llama 3.3 70B model, Cerebras has outperformed the Llama 3.1 405B model on various benchmarks while maintaining interactive speeds of 100 tokens per second. The company has also released a detailed technical documentation outlining the capabilities of CePO.
According to Cerebras, while models like OpenAI o1 and Alibaba QwQ have demonstrated the power of additional computation at inference time, CePO brings these capabilities to Llama – the world’s most popular open-source LLM family. The company has also compared its technique with GPT-4 Turbo and Claude 3.5 Sonnet, and it has achieved comparable performance in most benchmarks. However, there is no comparison being made with the industry-leading reasoning model – OpenAI’s o1. For instance, while the Llama 3.3 70B model scored 53.3% on the GPQA benchmark, the o1 model scored a higher 76%. This is because OpenAI has not revealed the number of parameters in the o1 model, which is significantly more than 70B parameters.
Andrew Feldman, CEO and Co-founder of Cerebras Systems, said, “By bringing these capabilities to the Llama family of models, we’re democratizing access to sophisticated reasoning techniques previously limited to closed commercial systems.” The company has also announced that it will open-source the CePO framework. Additionally, it aims to develop more advanced prompting frameworks that leverage comparative reasoning and synthetic datasets that are optimised for inference time computing.
Cerebras is using the latest edition of Meta’s Llama, the Llama 3.3, which was announced a few days ago. According to Meta, the model delivers leading performance in synthetic data generation, and it also supports an expanded context length of 128k tokens. Recently, Meta also unveiled a new technique called COCONUT (Chain of Continuous Thought), which overcomes the limitations of the Chain of Thought (CoT) technique, where the explicit reasoning process is generated in natural language tokens. Instead of making the model convert its internal thinking into words after each step, COCONUT uses its internal thinking as a starting point for the subsequent step.
Reasoning models are the next big thing in the AI ecosystem. While OpenAI has just unveiled the full version of the o1 model, it faces stiff competition from the East. China’s DeepSeek R1 Lite is said to offer better reasoning capability than the o1 and is also available as an open-source model.