01 / Conversational creation
Natural Language Video Editing
Edit scene, object, camera, motion, style and materials by simply describing the next change.
Gemini Omni Flash is Google’s new multimodal creation model for turning text, images, audio and videos into high-quality AI videos — then editing them through natural language, one instruction at a time.
Gemini Omni Flash is Google’s new multimodal creation model for turning text, images, audio and videos into high-quality AI videos — then editing them through natural language, one instruction at a time.
01 / Conversational creation
Edit scene, object, camera, motion, style and materials by simply describing the next change.
02 / Any input
Combine multiple references into one cohesive output instead of switching between separate AI tools.
03 / Scene memory
Each edit builds on the last, helping characters, physics and visual context stay coherent.
04 / World knowledge
Create explainers and meaningful scenes using Gemini’s understanding of science, culture and history.
05 / Physics
Generate effects involving gravity, kinetic energy, liquid ripples, lighting rhythm and reflective surfaces.
06 / Responsible AI
Personal avatar videos and AI-generated content transparency are supported through Google’s responsible AI stack.
Official Gemini Omni examples show how a source video can be transformed through short natural-language prompts. The video components below are playable demo placeholders; replace the local MP4 files with official licensed assets when publishing.
Make the sculpture out of bubbles.
When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material.
Gemini Omni examples combine object transformation, recursive scene logic and synchronized action into one prompt.
Dim the lights in the room. Put a black and white checkerboard room inside a glass sphere that floats tracking above the hand, inside it contains a recursive representation of the same hand holding the sphere, creating an infinite recursive of rooms. Camera slowly gets closer into the sphere, creating a video loop.
Gemini Omni can use audio cues as part of the generation instruction, creating visual events synchronized to music or interaction.
The lights of the apartments start turning on in sync with the music.
Add harp sounds synchronized to when I touch each fern leaf. Change the leaf structure to all resemble semi translucent 3d bioluminescent plant life, with bioluminescent fireflies flying around it that react as I play.
Gemini Omni is positioned as video generation grounded in Gemini’s real-world knowledge, including physical intuition and explainable concepts.
A marble rolling fast on a chain reaction style track, continuous smooth shot.
Claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate.
Use Gemini Omni-style prompts for vertical social clips, creator avatars, explainers, remix videos and short promotional assets.
Create videos with your own digital avatar so the generated clip looks and sounds like you. Use it for personalized announcements, social storytelling and short-form content.
Gemini Omni’s landing-page narrative should highlight workflow compression: fewer separate tools, more multimodal references and easier natural-language iteration.
Step 1
Choose Gemini Omni-style multimodal video generation from the AI video model area.
Step 2
Add a prompt and optional references such as image, video or audio to guide the final clip.
Step 3
Preview the result, edit with natural language and export for social platforms.