Tencent, a Chinese tech giant, has recently unveiled a new open-source model called Hunyuan Large. This model has a whopping 389 billion parameters, with 52 billion active parameters, making it one of the largest open-source models in its category. It also supports a context length of 256,000 tokens, surpassing the 128,000 context length of both Llama 3.1 70B and 405B models. The code and model are now available on GitHub and Hugging Face, making it a strong competitor in the open-source arena.
What sets Hunyuan Large apart is its performance, outperforming the Llama 3.1 70B model on various benchmarks in both English and Chinese. It also shows comparable performance to Meta’s flagship Llama 3.1-405B model on tasks involving language understanding, coding, maths, and logical reasoning. Unlike the Llama 3.1 405B, Hunyuan Large is not a ‘dense’ model, meaning it does not use all of its parameters for each input. Instead, it utilizes Mixture of Experts (MoE) scaling laws to optimize the balance between model size, data volume, and performance. This makes it more efficient as it only activates a subset of parameters based on its input.
Hunyuan Large incorporates several innovative techniques, including using 1.5 trillion tokens of higher-quality synthetic data and various model structure enhancement techniques to reduce memory usage, increase performance, and balance token usage. Tencent compared Hunyuan Large against leading open-source models in both pre-and post-training stages, and it consistently outperformed other dense and MoE models with similar parameter sizes.
The release of Hunyuan Large has sparked a debate about the most promising techniques for large language models (LLMs) in the community. Tencent hopes that this will lead to further improvements in their model and contribute to the development of more helpful artificial general intelligence (AGI) in the future.
This announcement comes after the news that China has adopted Meta’s open-source models for building a chatbot for military applications. This sparked a debate between Vinod Khosla and Yan LeCun, with Khosla criticizing Meta for providing easy access to LLMs. LeCun retaliated, stating that China is competent in generative AI and would not solely rely on Meta’s models. With the release of Hunyuan Large, LeCun’s statement may hold true.
Interestingly, Meta has also announced that it will make Llama available to the US government and other private organizations working in the interests of national security. This further emphasizes the importance and potential of large language models in various industries.