Meta has recently announced the release of their new open-source language model, Meta Spirit LM. This model is designed to seamlessly integrate speech and text, improving upon current text-to-speech processes that often overlook the expressive qualities of speech. By utilizing a word-level interleaving method during training, Meta Spirit LM is able to generate more natural-sounding speech and learn tasks across different modalities. This model comes in two versions, Spirit LM Base and Spirit LM Expressive, which incorporate different tokens to convey tone and emotions. Meta hopes that this model will inspire further development in speech and text integration within the research community.
Google has also recently launched NotebookLM, a tool that can convert any text into a podcast. This feature is powered by Google’s Gemini 1.5 model and allows users to input a link, article, or document and have it transformed into a podcast featuring two AI commentators discussing the topic. OpenAI has also introduced their Advanced Voice Mode on ChatGPT, which has been used for various creative purposes such as dramatic reenactments and singing duets.
Support authors and subscribe to content
This is premium stuff. Subscribe to read the entire article.