← Learn
01 · FIELD MANUAL · 9 MIN

The Five C's, for the prompt box.

In 1965, a cinematographer named Joseph Mascelli sat down and wrote out how movies are actually made: Camera Angles, Continuity, Cutting, Close-ups, Composition. The five C's. I've been recommending the book for years and exactly nobody reads it — fair enough, it's 246 pages and you have videos to make. So here's the part that matters, in plain words. The trick to all of it: amateurs point at what they want. Pros name it. Naming things is a learnable skill, and you're about to learn it.

01 · FIELD MANUAL

The working vocabulary

Each concept gets a plain explanation, then the before-and-after: a prompt that hopes, next to a prompt that knows.

Shot sizes

A scale from extreme long shot (a speck in landscape) to extreme close-up (just an eye). Each size has a job: long shots establish geography, mediums carry dialogue, close-ups carry feeling. These terms are the most reliable handles a video model has.

Closeup of a woman crying in her car
Medium close-up, shoulders to forehead, seated in the driver's seat, three-quarters to camera. Shallow depth of field, windshield blurred behind her.

The action axis (180° line)

Two people facing each other have an invisible line between them. Keep the camera on one side of it for the whole scene. Cross it and they appear to swap places — the single biggest reason multi-clip AI scenes feel broken.

Two friends arguing in a diner booth
Two-shot, medium, woman frame-left, man frame-right. Establish the 180° axis along the table edge. Every subsequent shot: her eyeline camera-right, his camera-left.

Screen direction

Exit frame-right, enter the next shot frame-left, still moving right. Reverse it and the audience reads a turn-around. Every clip you generate needs its travel direction named, or each generation invents its own.

A woman drives from her apartment to her office
Three shots, all left-to-right: (A) car exits driveway frame-left to frame-right. (B) highway profile, motion continues L→R across the cut. (C) car enters frame-left at the office, parks.

Cut-in vs. cut-away

A cut-in moves closer to the main action (the hand, the letter). A cut-away leaves it briefly (the clock, the dog). Both compress time and — crucially for AI work — both hide cuts between clips that refuse to match.

Two scientists argue over a discovery
Master two-shot at the lab bench. Cut-ins: each reaction close-up at the pushback; the printout sliding across the bench. Cut-aways: the centrifuge spinning; the wall clock advancing.

Master scene technique

Generate the wide master first — it locks characters, wardrobe, lighting, location. Then generate coverage (mediums, close-ups, reactions) described to match it. Without a master, every clip invents its own world.

A dinner party where someone announces they're engaged
Master: wide, full table of eight, warm tungsten practicals, the announcement and the reactions in one take. Lock this look. Coverage: medium two-shot of the couple, close-up of the announcer, two reaction close-ups, cut-in of the ring.

Match on action

Cut during a motion and the motion stitches the seam shut — the eye rides it straight past. Cut between two static frames and every join is visible. When planning paired generations, describe identical motion at the cut point.

A man opens a door and walks into the kitchen
(A) Medium shot, right hand grips the brass knob, begins to turn. (B) Cut on motion: close-up of the knob mid-turn, same hand continuing the rotation at the same speed, door swinging inward.
02 · HOW THE MODEL THINKS

Why your long prompt gets ignored

Ever written a beautiful paragraph of a prompt and gotten back something that ignored half of it? You're not crazy, and the model isn't broken. It just doesn't read the way you do — it weighs every word against every other word, and the math has consequences you can use:

Attention is a budget. Thirty adjectives don't add up — they dilute. Three exact nouns ("charcoal wool overcoat") outpull a paragraph of vibes ("amazing cinematic stylish outfit"). Specificity concentrates attention; volume scatters it.

There is no memory between generations. Every clip starts from zero. The only consistency mechanism you have is repetition: the same canon block — character look, wardrobe, location, grade — pasted at the head of every prompt, word for word.

Professional terms are dense tokens. "Medium close-up, low angle, shallow depth of field" encodes a whole camera setup in nine words, because those phrases dominated the training data's captions of real cinema. Vague words land in vague regions; craft vocabulary lands where the good footage lives.

03 · DOCTRINE

Think in shots, not scripts

Here's the mindset shift that fixes most AI video on the spot: a model gives you one camera setup, a few seconds long. That's a shot. Ask it for a whole scene and you get mush, because nobody — not even Spielberg — shoots a scene in one ask. The method, every time:

1. Break the idea into shots before you prompt anything — master, coverage, inserts. 2. One action per shot. The single beat that happens, in visible physical nouns. 3. Generate short, stitch in post. Shorter clips fail less and cut better. 4. Carry the canon. Same canon block leading every prompt. 5. When two clips won't cut, insert a third — a cut-in or cut-away absorbs almost any mismatch while the eye is elsewhere.

The Tool and the Studio exist to make those five steps the lazy path. Use them until the habit sticks, then notice you don't need them anymore. That's the plan, and I'm not even mad about it.

← All of LearnTry the Tool →