Directing the scene, with references and multi-shot.
Stop describing, start directing. Seedance 2.0's @reference system, character and scene images, multi-shot planning, atmosphere layers, and long-form chaining — built around the five planning questions that set your whole approach.
The whole jump from beginner is one idea: references are instructions. Instead of writing more adjectives and hoping, you hand Seedance images, clips, and audio — and tell it exactly what each one is for. The model stops guessing. Same two colors as before: teal goes to your AI chat, tungsten goes into the Seedance prompt field.
One rule for the teal blocks
Run all the teal (AI chat) steps in a single ongoing conversation — the planner, the shot list, and each prompt build on the turn before. The tungsten blocks go into Seedance.
Before you build anything
Ask yourself five questions
Your answers decide the whole approach — how many shots, which references, one clip or many. Two minutes here saves an hour of re-rendering. Answer all five, then start.
1
Is there a character who comes back?
YesYou need 1–3 character reference images. Tag them @image and reuse the exact same file in every shot.
NoSkip identity anchors. Describe the subject in plain text and let the first frame carry the look.
2
One moment, or a sequence of beats?
One momentA single clip, one camera move. Simplest and steadiest.
A sequenceEither pack a few beats into one clip, or make separate shots and stitch them (Step 5).
3
How long does it need to be?
Under ~12sA single generation covers it. Add beats to push toward the top of that range.
LongerA Extend the same shot, or chain separate clips and stitch in an editor (Step 5).
4
Does a specific camera move or rhythm matter?
YesGive it a @video motion reference. The model copies the move and pacing — better than any wording.
NoDescribe one move in plain words. Slow, smooth, gentle.
5
Does sound or dialogue carry the scene?
YesWrite the dialogue with delivery notes and/or add an @audio mood reference. Seedance generates synced sound.
NoLet the built-in ambient audio handle it — it comes free with the render.
The workflow
Seven steps to a directed scene
1
Gather your references — and give every file a job
whyAn untagged file makes Seedance guess, and guessing is what makes scenes drift. A tagged file is a direct instruction. This is the single biggest reliability lever you have.
You get three kinds of anchor. Each does one job:
@image
Identity & look
Face, wardrobe, style. Locks how a character or product appears.
Best: mid-body portrait, simple background. Up to 9 images.
@video
Motion & camera
Camera path, pacing, a specific move like a Hitchcock zoom or orbit.
Up to 3 clips, 15s total combined.
@audio
Rhythm & mood
Beat to cut on, music tone, the feel a voice should match.
Up to 3 files, 15s total combined.
How the numbers work — read this
The number follows upload order: your first uploaded image is @image1, the second is @image2, and so on (same for video and audio). So upload your files in the order your prompt names them — or write the roles to match the order you uploaded. Some tools show an @ menu and auto-label files; others just call them Image 1, Image 2 in plain text. Same contract, same result — use whichever your tool shows.
Characters & real people — the rule that trips everyone
Seedance 2.0 does not allow realistic, identifiable real-human faces as references — it's built for scenes, places, products, creatures, and AI-generated characters. Make your character in an AI image tool (an invented face), then reference that. Use a clean mid-body portrait, simple background; for consistency give it 1–3 angles of the same character in the same lighting — mixing mismatched images makes the face morph. Need a specific real person? Seedance won't do it — use Kling 3.0 or Veo.
When slots get tight, fill them in this order: first frame / core look → character (1–3) → camera (1 video) → audio (1) → extra scene details.
More references ≠ better
You can load up to 9 images, but the sweet spot is 3–5 images + 1–2 videos + 1 audio. Fill all 12 slots and the model tries to satisfy every constraint at once and nails none. When it does blend, rank what matters — name a primary identity anchor ("Primary identity: @image1 — keep the face and wardrobe") so it doesn't average your references.
2
Plan the shot list with your AI
whyA sequence needs a map before a single prompt. The AI turns your idea and your five answers into numbered shots — each one a single continuous move — plus a shared style line that keeps them all looking like the same film.
Paste to your AI chat
You are my cinematic director and shot planner. Here is my story idea:
[IDEA]
My answers: recurring character = [Y/N + who]; one moment or a
sequence = [ ]; total length = [ ]; specific camera/rhythm =
[Y/N]; sound/dialogue carries it = [Y/N].
Produce a numbered shot list. For each shot give:
(1) a one-line action that is a SINGLE continuous camera move,
(2) which references it uses, written as @image/@video/@audio roles,
(3) a duration in seconds (keep each shot 4–12s).
At the very top, write one shared "GLOBAL STYLE CUE" line I can paste
into every shot so the clips cut together. Output only the global cue
line and the shot list.
3
Write a multi-reference prompt — roles at the top
whySeedance reads role assignments first. Listing every @ job before the scene description is the difference between the model knowing your plan and improvising over it.
The shape: roles → scene & action with time beats → one camera move → one lighting line → style & aspect → an "Avoid…" line. Have the AI build each shot:
Paste to your AI chat
Write one Seedance 2.0 prompt for shot [N] from the list above.
Order it exactly like this:
1. Reference role assignments (e.g. "@image1 = character, appearance
only; @video1 = camera path").
2. The scene and action, with time beats (0–3s… 3–8s…).
3. One camera move.
4. One lighting line.
5. Style and aspect ratio.
6. A short "Avoid…" line.
Rules: 60–100 words after the role lines; one camera move; camera and
subject movement in separate sentences; end with "one continuous shot,
no cuts"; let the images carry identity — do NOT re-describe the
character's face. Output only the prompt.
A finished multi-reference prompt looks like this — character + environment + camera, each doing its job:
Paste into the video prompt field
@image1 = the detective (use for appearance only).
@image2 = the rain-soaked alley environment.
@video1 = camera path and pacing.
The detective walks slowly toward the camera, then stops and looks up.
Neon reflections on wet pavement, steam rising from a manhole, deep
night. Follow @video1 for one slow dolly move. High-contrast
film-noir light, hard shadows. Filmic, vertical 9:16. One continuous
shot, no cuts. Avoid jitter, bent limbs, and face warping.
The naming rule that saves you
Don't describe what an image already shows. If @image1 is your character, write "appearance only" — never re-describe her face or age. Re-describing fights the reference and can trip the content filter. Let pictures carry looks; let words carry action and camera.
4
Build atmosphere in layers — with restraint
whyDepth and scale read as parallax: things moving at different distances. But busy + fast + quick cuts is exactly what makes a clip jitter — so you layer for depth and hold everything else calm.
Think in a few simple planes. You don't need all of them — pick what the shot needs:
Foreground
One moving layer for depth — drifting haze, sparks, dust, rain. Keep it to a single direction.
Subject
Your hero. Highest detail, clearest light. Everything else supports this.
Background
Distance via atmospheric haze — paler, softer, slower. This is where scale comes from.
Light
One strong lighting line. It's the highest-leverage sentence in the whole prompt — worth more than ten adjectives.
The jitter rule — memorize it
Write the camera move and the subject's movement in separate sentences. "The dancer spins slowly. The camera holds a fixed frame." — clean. "Spinning camera around a dancing person" — shaky mess. And never stack fast camera + fast cuts + busy scene; if you need speed, make only one thing fast.
Want the full five-layer depth system, with motion keywords per plane? That's the clean-source protocol.
5
Go longer — two ways, and they're different
whyA single clip caps near 12 seconds. How you continue depends on whether you want the same shot to keep going or a new shot to follow.
Same shot, continued
Extend
Seedance continues the existing clip seamlessly — the "visual DNA" of the first seconds stays intact.
Use for: a longer single take, a character beat that needs room, a lingering ending, a smooth follow-through.
New shot, stitched
Chain
Generate separate clips and join them in an editor. Each is its own framing and move.
Use for: distinct angles or scenes — wide → close, hook → demo → call-to-action, scene A → scene B.
The trick that makes chained clips cut together without color grading: paste the same global style cue at the very top of every prompt. Let the AI generate the whole set:
Paste to your AI chat
Turn my shot list into [N] separate Seedance 2.0 prompts for chaining.
Put this exact line at the very TOP of every prompt so the clips match:
[PASTE GLOBAL STYLE CUE]
Each prompt = one continuous shot with one camera move, 60–100 words,
role assignments first if it uses references, and a short "Avoid…" line.
Number them Shot 1, Shot 2, … Output only the prompts.
6
Add dialogue and sound
whySeedance bakes in synced audio — lip movement, ambience, music — so written lines plus an @audio mood anchor get you a publish-ready scene with no separate sound pass.
Put the line in quotes and tell it how to be said. Add an audio reference for tone if you have one:
Paste into the video prompt field
@image1 = the two friends (appearance only).
@audio1 = mood reference (warm, intimate).
Two friends sit across a small café table at golden hour. The woman
smiles and says: "I knew you'd come back." Delivery: quiet, warm,
close-mic. The man looks down, then nods. Soft window light, shallow
depth of field. One gentle push-in. Filmic, vertical 9:16.
Avoid jitter and lip-sync drift.
7
Iterate like a director — one variable at a time
whyWith several references in play, changing two things at once hides which one mattered. Move one lever, regenerate, learn. That's how a repeatable style gets built.
Match the symptom, apply the single fix:
Character drifts shot to shot
Reuse the identical reference file and add maintain exact appearance from @image1 to each prompt.
Face morphs mid-clip
Your character images disagree. Use 1–3 of the same character in matching lighting; don't blend mismatched photos.
Camera ignores the reference
Describe the move in stages, or lean harder on @video1. Don't fight it with adjectives.
Style / grade doesn't match
Name the grade in words: "desaturated blues, crushed blacks." Pointing at the image isn't enough.
Audio feels off
Give it a job and timing: @audio1 for tension building to the 10s mark.
Two subjects merge / extra limbs
Too many subjects in one beat. Fewer beats, one clear action each.
Then make the single change with the AI:
Paste to your AI chat
Here is my Seedance prompt and its references: [PASTE]
The issue is: [ONE ISSUE — e.g. the character looks different from the reference].
Apply ONLY the single fix for this issue and change nothing else.
Output the revised prompt, then one line on what you changed.
Keep this open
Your project checklist
Direct, don't describe
✓Answer the five questions first — they set the whole approach.
✓Every uploaded file gets an @role. Untagged = the model guesses.
✓List all @roles at the top, before the scene. Always.
✓@image = look · @video = camera · @audio = mood.
✓@number follows upload order — first file = @image1.
✓Let images carry identity — don't re-describe a referenced face.
✓Seedance is for scenes, products & invented characters — not real faces.
✓Reuse the same character file + "maintain exact appearance" for consistency.
✓Camera and subject motion in separate sentences. Never stack "fast."
✓Extend = same shot continues · Chain = new shot, share one style cue.
✓Iterate by one variable at a time.
Seedance 2.0 reference limits
What fits in one generation
9
images — frames, characters, scenes, style
3
video refs · 15s total combined
3
audio refs · 15s total combined
12
files total, across all types
4–15s
per clip · Extend or chain for more
60–100
words · the sweet spot (caps vary)
Built for scenes, products, and AI-made characters — realistic real-human faces are restricted (use Kling 3.0 or Veo for real people). Keep total upload size reasonable; very large requests get rejected, so use smaller files or URL-based uploads for video where you can. Audio is generated built-in. Limits, resolution (720p default, 1080p ceiling), and the Extend feature vary by surface — Pippit, Dreamina, fal.ai, Morphic differ. Verified mid-2026; confirm on your platform's current page before a big project.