Salesforce AI Research recently announced the launch of Moirai-MoE, the first mixture-of-experts time series foundation model. This model aims to improve time series predictions by using a sparse mixture of experts (MoE) and achieving token-level specialization in a data-driven manner. The research team shared this update on their official blog and social media platforms.
Moirai-MoE is a significant upgrade from its predecessor, Moirai, which used multiple input/output layers to handle time series data with different frequencies. The new model simplifies this process by using just one input/output layer and relies on sparse MoE transformers to capture a variety of time series patterns effectively.
The researchers tested Moirai-MoE on 29 datasets from the Monash benchmark and found that it outperformed all its competitors. It showed a 17% performance boost over Moirai and even surpassed larger models like Moirai-Base and Moirai-Large by 8% and 7%, respectively.
For zero-shot forecasting, Moirai-MoE was evaluated on 10 different datasets and showed a 3%–14% improvement in continuous ranked probability score (CRPS) and an 8%–16% improvement in mean absolute scaled error (MASE) compared to all versions of Moirai. Moirai-MoE-Base delivered the best zero-shot performance, surpassing other models like TimesFM and Chronos.
What makes Moirai-MoE even more impressive is that it has just 11 million active parameters, making it 28 times smaller than Moirai-Large while still delivering outstanding results.
This new approach to time series forecasting is moving towards universal models that can handle different data types, domains, and prediction lengths without extra training. By using a pre-trained model with a mixture-of-experts transformer, the process becomes simpler and more efficient.
In conclusion, Moirai-MoE is a breakthrough in universal forecasting, delivering impressive results and outperforming its competitors. With its ability to handle various time series data and its compact size, it is a significant step towards improving time series predictions.