
text-to-videotext + image + audio + video -> video
Gemini Omni
Gemini Omni is Google DeepMind's native multimodal model that creates high-quality videos from any combination of text, image, audio, and video inputs. It delivers advanced world understanding, physics simulation, and natural conversational editing in a single unified system.
提供元
Google DeepMind
入力
text + image + audio + video
出力
video
Overview
Gemini Omni is Google DeepMind's first native any-to-any multimodal foundation model, purpose-built for video generation and editing. It collapses traditional pipelines (text-to-video, image-to-video, video-to-video) into one coherent system that reasons across modalities in a single forward pass.
Key Capabilities
- Multimodal Inputs: Combine text prompts with reference images, audio tracks, or source video clips.
- Conversational Editing: Refine videos through natural language instructions (e.g., "swap the background to a futuristic city" or "change the wardrobe to Victorian style").
- World Understanding: Built-in physics simulation, historical/cultural context, and storytelling intelligence for realistic, meaningful outputs.
- Templates & Remixing: Start from scratch, remix your own media, or apply premade templates directly in the Gemini interface.
Getting Started
- Open the Gemini app or visit gemini.google.com.
- Start a new chat and describe your video concept.
- Attach supporting media (images, audio, or short video clips) as references.
- Generate the video, then continue the conversation to iterate and edit.
Prompting Tips
- Be specific about camera movement, lighting, timing, and style.
- Reference uploaded images for character or scene consistency.
- Use audio inputs for synchronized sound design or voiceover.
- For editing, reference previous generations with phrases like "keep the same characters but...".
Example Use Cases
- Text-to-Video: "A professor writes out a mathematical proof for trigonometric identities on a traditional chalkboard, explaining each step."
- Image-to-Video: Upload a still photo and prompt to animate it with realistic motion and context.
- Video Remixing: Upload a clip and instruct style transfers or background changes.
- Audio-Driven: Provide voiceover audio and generate matching lip-synced video.
Availability
Available now in the Gemini app, Google Flow, and YouTube Shorts. Developers can access it via the Gemini API for integration into creative workflows.
Gemini Omni プロンプト
1件の例
