Google has unveiled its latest generative AI model, Omni, which promises to transform any type of input—photos, videos, or text—into any other format. For now, the initial release, Omni Flash, focuses on video generation and is available through Google's Flow platform. This new model builds upon the earlier Veo model, claiming improved real-world knowledge and better character consistency. However, a first-hand test reveals a mixed bag of impressive realism and persistent AI glitches.
Testing the Limits: From Stuffed Animals to Deepfakes
To evaluate Omni’s capabilities, a senior reviewer conducted an experiment using a plush deer toy named Buddy. The goal was to see how well the model could generate videos based on text prompts and uploaded images. The results ranged from surprisingly coherent to bafflingly inconsistent. For instance, Buddy’s orientation would suddenly flip during a skydiving scene, and props like a honey jar changed shape and color across clips.
Another test involved creating a montage of Buddy packing for a cruise. The AI cleverly included a jar of honey that later appeared as a sunscreen bottle, adding a playful narrative. Yet the same scene showed the honey container morphing from a jar to a squirt bottle and back again. The final frame appeared to be a chaotic mix of previous elements, suggesting the model struggles with long-term consistency.
Deepfakes That Fool Even Close Family
The most striking test involved deepfaking the reviewer herself. Starting from a selfie video with a neutral expression, Omni generated clips of her eating spaghetti, sitting in an airplane seat, and posing in front of the Eiffel Tower with a baguette. The results were startlingly realistic. When shown to her husband—who sees her daily—he believed the pasta-eating video was genuine, only questioning the unfamiliar bowl. The deepfakes had subtle tells: overly manufactured sound effects, a duplicate background character, and a slightly uncanny head turn that revealed an AI-generated ponytail. But overall, they were convincing enough to fool social media audiences.
Credit Costs and Accessibility
While the technology is powerful, it is not free. Generating videos consumes credits: 15 to 40 credits per clip depending on length and inputs, with edits costing 40 credits each. The reviewer’s $20-per-month Pro plan provides 1,000 credits per month. After generating about 20 clips with some edits, only 145 credits remained. This pay-per-use model could make iterative refinement expensive for users with specific creative visions.
Comparison with Previous Models
Omni represents a significant step forward from Veo 3, which was difficult to edit and often produced poor results. Omni is more responsive to text-based edits, but the outcomes are still hit-or-miss. For example, when asked to emphasize Buddy’s facial reactions, the model made the deer look strange and occasionally added antlers to the antler-less baby deer. Attempts to remove antlers from one scene caused them to appear in all others.
The Broader Implications of Accessible Deepfakes
The ease with which Omni creates convincing deepfakes raises important questions about misinformation and trust. The reviewer noted that the edge of surprise has worn off, as each new generation of AI tools pushes realism further. Google claims Omni integrates more real-world knowledge to improve coherence, but the test shows that fundamental issues like object permanence and logical consistency remain unresolved. The model excels at short, single-shot scenes but falters when asked to maintain details across multiple shots or complex narratives.
Technical Background and Context
Omni is part of Google’s broader Gemini initiative, which aims to create multimodal AI systems. The name “Omni” reflects the goal of universal input-to-output transformation, though current capabilities are limited to video generation. The model uses diffusion-based architecture trained on vast datasets of images and videos, allowing it to generate plausible scenes from minimal prompts. However, it lacks true understanding of physics or causality, leading to the kind of glitches observed in the tests.
Google’s competitors, including OpenAI with Sora and Meta with Make-A-Video, are also racing to dominate generative video. The technology has potential applications in entertainment, education, and marketing, but also poses risks for deepfake scams and political manipulation. Regulatory frameworks are still catching up, with many jurisdictions exploring labeling requirements for AI-generated content.
The testing also highlights the uncanny valley effect: viewers can sense something is off even if they cannot pinpoint it. In the deepfake videos, slight asymmetries in facial movements, unnatural eye blinking, and inconsistent lighting give away the AI origin. Yet as models improve, these tells become less detectable. The reviewer’s husband’s inability to spot the fake pasta video suggests that we are approaching a point where deepfakes can deceive even those who know the subject intimately.
Practical Use Cases and Limitations
Despite its flaws, Omni offers practical tools for content creators. The ability to insert AI-generated elements into real video opens up new possibilities for visual effects, personalized messages, and rapid prototyping. For example, a marketer could generate a realistic product placement without expensive location shoots. However, the credit system and inconsistent quality mean that professional use may require multiple attempts and significant costs.
For casual users, the novelty quickly wears off as glitches accumulate. The reviewer noted that after a few clips, the excitement gave way to exhaustion. The model’s tendency to introduce random elements—like antlers—undermines user control. Google acknowledges these limitations and promises ongoing improvements, but the fundamental challenge of maintaining narrative and visual consistency remains a hard problem in AI research.
In summary, Google’s Omni model is both impressive and imperfect. It can create videos that are good enough to fool people, but it still makes mistakes that break immersion. The technology is moving fast, and the gap between human intuition and machine generation is narrowing. Whether this leads to creative empowerment or societal harm depends on how we choose to use it.
Source: The Verge News