Last week, OpenAI CEO Sam Altman announced that video is crucial for achieving AGI, with the launch of their new model, Sora. However, Google has now countered with their own generative AI models, Veo 2 and Imagen 3. While these models are not yet publicly available, Google’s product lead Logan Kilpatrick revealed that they will be accessible through API by early next year. Veo 2 boasts advanced capabilities such as handling reflections and shadows, resulting in clearer and sharper footage. It also includes SynthID watermarking for added safety. Google’s internal testing has shown that Veo outperforms competitors like China’s Kling, Meta’s Moviegen, and OpenAI’s Sora in terms of quality and prompt adherence. Early testers, including a16z partner Justine Moore, have noted the model’s excellence in creating nature and animal-related clips, as well as capturing detailed movement. Veo 2 builds upon its predecessor, first showcased at Google I/O in May, and has since been integrated into YouTube and Google Cloud. According to Google DeepMind’s Tom Hume, Veo 2 offers enhanced realism and improved detail, thanks to its advanced cinematic understanding. However, co-lead Shlomi Fruchter hints that the model still struggles with complex physics. Wharton’s Ethan Mollick believes that Sora offers more control options and longer clips, making it difficult to compare with Veo 2. Interestingly, Google claims that Kling is their biggest competitor, according to their blog. The true test for Veo 2 lies in generating a gymnast’s routine, showcasing its improved grasp of human movement and accurate modeling of complex motions. In a viral tweet shared by VC Deedy Das, Sora failed to perform on this prompt. Veo 2 supports 4K resolution and can produce videos longer than two minutes, although it is currently limited to 720p and eight seconds on its experimental platform. Notably, it outperforms Sora with four times the resolution and six times the video duration. This release follows another significant development by DeepMind in the GenAI space: the launch of Genie 2, a foundation world model capable of generating interactive 3D environments from simple text prompts. These world models are crucial for training embodied AI agents, providing a diverse set of environments for them to generalize across various domains and become experts in their field.