
text-to-videotext + image + audio + video -> video
Gemini Omni
Gemini Omni is Google DeepMind's native multimodal model that creates high-quality videos from any combination of text, image, audio, and video inputs. It delivers advanced world understanding, physics simulation, and natural conversational editing in a single unified system.
提供方
Google DeepMind
输入
text + image + audio + video
输出
video
Overview
Gemini Omni is Google DeepMind's first native any-to-any multimodal foundation model, purpose-built for video generation and editing. It collapses traditional pipelines (text-to-video, image-to-video, video-to-video) into one coherent system that reasons across modalities in a single forward pass.
Key Capabilities
- Multimodal Inputs: Combine text prompts with reference images, audio tracks, or source video clips.
- Conversational Editing: Refine videos through natural language instructions (e.g., "swap the background to a futuristic city" or "change the wardrobe to Victorian style").
- World Understanding: Built-in physics simulation, historical/cultural context, and storytelling intelligence for realistic, meaningful outputs.
- Templates & Remixing: Start from scratch, remix your own media, or apply premade templates directly in the Gemini interface.
Getting Started
- Open the Gemini app or visit gemini.google.com.
- Start a new chat and describe your video concept.
- Attach supporting media (images, audio, or short video clips) as references.
- Generate the video, then continue the conversation to iterate and edit.
Prompting Tips
- Be specific about camera movement, lighting, timing, and style.
- Reference uploaded images for character or scene consistency.
- Use audio inputs for synchronized sound design or voiceover.
- For editing, reference previous generations with phrases like "keep the same characters but...".
Example Use Cases
- Text-to-Video: "A professor writes out a mathematical proof for trigonometric identities on a traditional chalkboard, explaining each step."
- Image-to-Video: Upload a still photo and prompt to animate it with realistic motion and context.
- Video Remixing: Upload a clip and instruct style transfers or background changes.
- Audio-Driven: Provide voiceover audio and generate matching lip-synced video.
Availability
Available now in the Gemini app, Google Flow, and YouTube Shorts. Developers can access it via the Gemini API for integration into creative workflows.
Gemini Omni 提示词
1 个示例
