The question of how many R’s are in the word “strawberry” is one that has been asked by many language model enthusiasts. However, it wasn’t until the release of OpenAI’s o1 series of models that we finally got the correct answer. This marked a shift in the AI landscape, as almost everyone now had access to “PhD-level intelligence”. In a recent podcast, Diana Hu, general partner at Y Combinator, attributed this rise in reasoning models to OpenAI’s early work with DOTA, where they implemented reinforcement learning techniques inspired by AlphaGo and AlphaZero. These o1 models utilized CoT and reasoning tokens to answer complex questions, something that was not possible before.
One of the most significant breakthroughs in reasoning models was the development of Reflection, which explained how models can reason. However, things took a different turn when China entered the reasoning model race and now boasts multiple models that are on par or even better than the o1 series. According to Aravind Srinivas, CEO of Perplexity AI, China is ahead of the USA in open-source reasoning models, and America needs to fight back for its open-source supremacy.
This can be seen in the multiple open-source reasoning models available on platforms like Hugging Face. So, what fueled this tsunami of reasoning models? The traditional “scaling laws” theory, which suggested that adding more data and computing power would continuously improve AI capabilities, is now being questioned. Major AI labs like OpenAI, Google, and Anthropic are not seeing the same dramatic improvements in their models as they once did. This limitation has pushed researchers to explore new approaches, including reasoning models.
In recent times, we have seen the emergence of multiple frameworks that have become the standard for developing reasoning models. While the core components of these models remain the same, researchers and developers have been exploring hybrid approaches that combine multiple technologies. This includes the integration of neuro-symbolic AI with traditional deep learning, allowing systems to both learn from data and reason with explicit knowledge.
One key advancement in reasoning models is the integration of test-time computation with process supervision. This has been demonstrated in the OpenR framework, which unifies data acquisition, reinforcement learning training, and non-autoregressive decoding into a cohesive platform. This approach has shown significant improvements, with process reward models and guided search enhancing test-time reasoning performance by approximately 10%.
The recent introduction of Nous Research’s Reasoning API is another milestone in the development of reasoning models. It showcases how even small companies can now access and utilize these advanced technologies. As we continue to see advancements in reasoning models, it is clear that they will play a crucial role in the future of AI and its applications.