text-to-videotext + image -> video

Veo 3.1

Veo 3.1 is Google's state-of-the-art video generation model from DeepMind, capable of creating high-fidelity videos up to 4K resolution with natively generated audio from text prompts and reference images.

제공자

Google

입력

text + image

출력

video

Veo 3.1

Overview

Veo 3.1 is Google's latest video generation model, designed for filmmakers, storytellers, and developers. It produces realistic, cinematic 4-8 second video clips (extendable via API) with synchronized native audio, including dialogue, sound effects, ambient noise, and music. The model supports both text-to-video and image-to-video workflows, with advanced controls for reference images, scene extensions, first/last frame interpolation, and object insertion/removal.

Key Capabilities

Resolutions: 720p (default), 1080p, and 4K (preview; 8-second duration only for higher resolutions)
Aspect Ratios: 16:9 (landscape) and 9:16 (portrait)
Durations: 4, 6, or 8 seconds (8 seconds required for reference images, high resolution, or extensions)
Frame Rate: 24 FPS
Inputs: Text prompts (up to 1,024 tokens), up to 3 reference images for character/style consistency, or prior Veo-generated videos for extension
Outputs: MP4 video with embedded native audio
Creative Controls: Reference images (portrait/landscape), scene extension, first/last frame transitions, object add/remove, camera movements (dolly, pan, zoom), and cinematic styling
Audio: Native generation fully synced to visuals—supports dialogue (in quotes), SFX, ambient sound, and music

Strengths

Exceptional realism in physics, lighting, shadows, and natural motion
Superior prompt adherence and cinematic understanding (camera language, composition, styles)
Rich native audio that enhances storytelling without post-production
Strong character and style consistency via reference images
Flexible creative tools for professional workflows (extensions, transitions, object editing)
Available across Google AI Studio, Gemini API, Vertex AI, and consumer tools like Gemini app

Limitations

Short base clip length (4-8 seconds; extensions possible but limited to Veo-generated source videos)
Generation latency varies (seconds to minutes depending on load and resolution)
Strict safety filters block harmful or policy-violating content (including certain person generation in some regions)
All outputs include visible watermark + SynthID digital watermark for AI detection
Videos stored temporarily (2 days in API); must be downloaded promptly
No guaranteed determinism even with seed; audio dialogue consistency can vary for longer speech
Higher resolutions and extensions have stricter constraints (e.g., 720p only for some extensions)

How to Write Effective Prompts

Veo 3.1 excels with highly descriptive, cinematic prompts. Structure your prompt like a film direction:

Subject + Action: Start with the main scene and motion (e.g., "A wise old owl soaring through moonlit clouds")
Camera & Composition: Specify shots and movement ("follow shot", "slow dolly zoom", "shallow depth of field")
Style & Mood: Add artistic direction ("cinematic, photorealistic", "in the style of Studio Ghibli")
Audio: Describe sound explicitly ("wings flapping loudly", "mellow hip-hop beat with faint city murmurs", or dialogue: "What manner of magic is this?" the owl hooted)
Reference Images: Upload 1-3 images for characters/objects/styles to lock consistency

Example Prompt: "A medium shot of a seasoned grey-bearded man in sunglasses and a paisley shirt standing on a bustling city street at dusk. Camera slowly pushes in as he smiles. Faint city murmurs and distant chatter, accompanied by a mellow soulful hip-hop beat. 'The city always got a story,' the older man murmurs thoughtfully. Cinematic lighting, realistic physics."

Negative Prompt Example (supported): "blurry, distorted faces, text, logos, watermarks, low quality, artifacts"

Use the API parameters for fine control: aspect_ratio, resolution, durationSeconds, and reference images.

Veo 3.1 프롬프트

예시 5개

에픽 닌자 대결 라이브 스테이지 배틀

Veo 3.1

야외 일본 테마 공연에서 펼쳐지는 고에너지 닌자 대 사무라이의 스테이지 격투를, 분신, 에너지 공격, 극적인 안무, 그리고 흥분한 관객 반응과 함께 담은 초현실적인 휴대폰 핸드헬드 스마트폰 비디오.

닌자 배틀스테이지 공연일본 액션 쇼

사막 방랑자 드론 리빌

Veo 3.1

낡은 옷을 입은 한 남자가 광활한 사막을 천천히 걸으며 손으로 태양을 가리고 있다. 카메라가 드라마틱한 드론 리프트로 부드럽게 상승하여 오버헤드 뷰로 전환되며, 긴장감 넘치는 열기 속에서 끝없는 모래 언덕이 드러나고 스릴 넘치는 음악이 깔린다.

시네마틱 사막드론 촬영스릴러 분위기

이상하게 만족스러운 비누 껍질 벗기기 & 아이스크림 스쿱

Veo 3.1

크림 같은 비누가 부드럽게 벗겨지거나 진한 아이스크림이 완벽하게 떠지는 초근접 ASMR 영상. 풍부한 질감이 만족스럽게 부서지고 부드러운 소리가 따뜻한 조명 아래에서 들립니다.

ASMR만족스러운 질감비누 껍질 벗기기

크리스탈 유리 사과 ASMR 슬라이스

Veo 3.1

완전히 투명한 유리 사과를 날카로운 칼로 천천히 정밀하게 자르는 만족스러운 극클로즈업. 사실적인 주스 방울, 바삭한 소리, 그리고 초세부 매크로 영상이 기묘하게 만족스러운 ASMR 경험을 선사합니다.

ASMR유리 과일 절단매크로 음식

360° 골든 퍼퓸 오빗

Veo 3.1

고급 향수 병을 매크로 클로즈업하여 광택 있는 검은색 반사 표면 위에 배치하고, 카메라가 부드럽고 연속적으로 360도 궤도를 도는 장면입니다. 드라마틱한 스포트라이트가 유리 표면에 반짝이는 골든 하이라이트를 만들어내며, 얕은 심도와 고급 코스메틱 미학이 강조되고 부드러운 앰비언트 톤이 함께합니다.

럭셔리 제품퍼퓸 쇼케이스시네마틱 오빗