02 · FIELD GUIDE · 12 MIN · SEEDANCE 2.0 / PIPPIT

The clean-source protocol, for image-to-video.

Here's the single biggest reason people's Seedance and Pippit clips come back broken: they upload the planning image. The one with the arrows and the labels and the little character turnaround in the corner. The model can't tell your director's notes from the scene — so it renders them. You get glowing Chinese characters floating in the sky, your guide-arrows turned into solid metal, your character standing in an A-pose spinning in midair. This guide fixes that for good, with a workflow that's been blown up and rebuilt enough times to be boring and reliable.

Adapted from a battle-tested Mandarin Pippit/Seedance protocol, rewritten for the prompt box. The bilingual template below keeps the Chinese keys — they carry conditioning weight you don't want to translate away.

01 · THE ONE TRAP

The model renders your notes

Anything in the image that can't be read as light or texture gets built as an object. That's the whole rule. Text, arrows, and storyboard frame-lines become floating glowing letters, metal arrows, and literal borders inside the shot. Character three-views and zoomed-in detail thumbnails get collaged into one space — so you get extra arms, doubled people, a tiny picture-in-picture floating in the frame. Color swatches and material balls turn into glowing orbs and abstract sculptures.

There is exactly one fix, and no prompt can substitute for it: don't let the model see them. Annotations live only in the planning stage. If you're forced to reuse the same image as your source, send it through Midjourney or Stable Diffusion first to repaint it clean — strip every word and UI mark, keep the composition and light.

02 · THE METHOD

Two stages, never one

The control board guides the clean frame. The clean frame makes the video. Skip the middle and you gamble.

Stage A — the control board (internal only)

This holds all your director information: composition zones, shot-size labels, character positions, motion arrows, atmosphere notes. It is for your eyes and your image tool only. It NEVER enters the video model.

Upload the annotated layout straight into Pippit and hope it reads the labels as instructions

Build the 9:16 board with arrows and notes. Use it to brief Midjourney/SD. Then set it aside — it has done its job.

Stage B — the clean cinematic source (what Pippit gets)

A single cinematic still, generated from the control board, with zero text, arrows, or UI. This frame decides every visual thing about your video. It is the only image the model is allowed to see.

A busy reference sheet with three views of the hero and a palette strip down the side

One complete hero, midground-low at the golden ratio, ≥40% of the frame, native 9:16, no marks in any corner. Filename carries any label: epic_city_clean.png

Use the board to make a clean source, then use the clean source to make the video. This is more than ten times more stable than any amount of prompt-wrangling that tries to convince the model to ignore the text it can plainly see.

03 · IF YOU ONLY GET ONE IMAGE

Source-frame rules

If the pipeline only allows one upload, that one image must be the finished clean source, and it must obey:

Main subject ≥ 40% of the frame, set in the lower-mid golden-ratio zone. Too small and the model loses the subject and renders "ants in a wide shot." No-go zones: no text, icons, swatches, crosshairs, or crop frames in any corner. Labels go on the filename or in the text prompt, never in the pixels. Shoot native 9:16, subject whole — don't rely on the model to crop, because cropping makes it drift and push the lens around. Never include multi-panel comics, expression sheets, or turnarounds; the model tweens them into one deformed nightmare.

04 · SCALE

The five-layer depth stack

Epic scale isn't one big object — it's depth in motion. The source frame and the prompt have to agree on five layers, each with its own job and its own movement:

LAYER	CONTENT	MOTION CUE
Foreground 10%	Sand, smoke, grit, embers	Streaks past fast, motion blur
Near-mid 20%	Marching troops, mech feet, banners	One steady direction, constant pace
Midground 40%	The hero — building, colossus, ship	Slow advance, high detail, clear
Background 20%	Skyline, mountains, smoke columns	Aerial haze, pale blue-grey, drifting
Sky 10%	Cloud, light beams, storm top	Tyndall rays, dramatic roll

Light: rim your giant with side- or top-backlight, and paint that rim into the source frame — don't hope for it. Smoke and dust: layer it — fast motes up close, static columns far off. Crowds: the ant rule — small, low-detail, one shared direction. Machines: add self-illumination or reflections at key points so the model reads the material.

05 · COVERAGE

Three camera moves that hold for 10–15s

One continuous shot, no cuts. Put the move at the very front of the prompt and assign it time. Three structures survive the duration:

1. God's-eye descent. 0–3s drop through cloud, 3–8s punch out into a city reveal, 8–15s slow push past a midground colossus toward the army and storm behind it.

2. Low-angle follow (mortal's view). 0–5s ground-level up-look, a mech foot slams and cracks the earth; 5–10s tilt up and pull back to reveal the whole body and the legion behind it; 10–15s a slow lateral drift past the burning city through the joints.

3. Lateral reveal (the scroll). 0–4s skim a wall or ridge left-to-right, details flicking past; 4–10s pull back while moving to open the full army and smoke; 10–15s settle on the wide — empire in the sandstorm, a tiny sun.

06 · THE TEMPLATE

The bilingual prompt blueprint

Author in English, but keep the Chinese keys — on a model trained heavily on Chinese captions, they pull weight that a pure-English translation loses. Fill the braces, keep it one continuous shot, stay well under the character limit.

[镜头运动与节奏 / Camera move + timing]: {one continuous shot, NO cuts. e.g. descend through cloud at 0-3s, punch through, then slow push around the city for 12s, no cut}

[环境与氛围 / Environment + mood]: {vast ancient megacity in a red sandstorm, countless smoke columns linking earth to sky, grim epic atmosphere}

[主体与多层运动 / Subject + layered motion]:
- 前景 / Foreground: {dust, ash, embers streak past fast, motion blur}
- 中景 / Midground: {marching bronze colossus legions, heavy footfalls, armor catching light}
- 背景 / Background: {endless spires and floating fortresses, fading in the sand haze}
- 天空 / Sky: {churning ochre sand-cloud, golden god-rays through the cracks, beams drifting slowly}

[光线与特效 / Light + FX]: side-backlight rims the colossus in gold, armor glows faint red sigils, dust motes sparkle in the beams.

[画质要求 / Quality]: cinematic, ARRI ALEXA 65, ultra-wide, vertical 9:16, layered depth of field, high detail, 4K.

07 · WHEN IT BREAKS

Failure diagnosis

SYMPTOM	FIX THE IMAGE	FIX THE PROMPT
Too small, not epic	Push subject to midground, force ≥40%, wide-angle in close	“ultra-wide, close, exaggerated perspective, subject fills frame”
Subject deforms, multi-arm	One complete character only, no turnarounds	“keep exact original form, no deformation, single entity”
Twitchy, conflicting motion	Make the still's elements share one direction	Simplify to 1–2 global directions: “all elements drift right”
Text / arrow / UI residue	Repaint clean in AI or PS — remove every mark	No prompt fixes this. The source must be clean.
Random cuts, axis jumps	No image change needed	One continuous shot only; use “continues / meanwhile,” never “then cut”
Mushy, oil-painting look	Raise source to 1920×3360+, keep it sharp	“extreme detail, sharp focus, no blur, crisp texture”

The whole SOP in one breath: build the annotated board (eyes only) → generate a clean 9:16 source from it in Midjourney/SD with the board as a 0.6–0.8 reference and "no text, no arrows, no UI" in the prompt → grade and sharpen to 1080×1920+ → write the move-first bilingual prompt → generate in Pippit at low-to-mid motion → review for text residue and deformation first, scale and stability second. Every shortcut costs more time than it saves. Do it once, properly.

← All of Learn Build a scene in the Studio →