What We're Building

A 3 - 4 minute animated storybook video in paper cut-out style. The camera zooms into book pages, scenes come alive with subtle animation, pages turn, and narration carries the story forward.

The example production: "The Mighty Monster Afang," a Welsh folktale from 1921 (public domain). But this workflow adapts to any story you want to tell.

Production stats for this project:

  • 11 scenes (5 single keyframe, 6 with A/B transformation)

  • 17 total keyframes

  • 3 character reference sheets

  • 23 book page spreads

  • Runtime: 4:00

The Tool Stack

Every platform serves a specific purpose. No redundancy.

@AdobeFirefly handles all visual generation. Character sheets, scene keyframes, book spreads, and scene animations through @NanoBanana and Veo 3.1 Fast in Boards.

@elevenlabsio covers narration, sound effects, and video generation via @Kling_ai 01 for page turns and zoom transitions.

@suno generates the music.

@Adobe Premiere Pro assembles everything into the final video.

Here's the breakdown by task:

Phase 1: Character Reference Sheets

Character sheets are the foundation. Without them, your protagonist looks different in every scene. The monster changes species halfway through. The oxen become horses.

Reference sheets show each character from multiple angles with detail callouts highlighting their defining features. Every subsequent prompt references these sheets.

The Base Template

All character sheets follow this structure:

8K, hyper-realistic photography of layered paper character reference sheet, hand-cut paper [CHARACTER DESCRIPTION] shown in [VIEWS], [SPECIFIC DETAILS], detail callouts showing [CALLOUT ELEMENTS], watercolor textures in [COLORS], visible paper fibers, plain [BACKGROUND COLOR] background, stop-motion paper cut-out look, soft even diffused lighting, crafted collage aesthetic, character design reference sheet

Standard Negative Prompt

Use this for all character sheets:

photorealistic skin, modern clothing, harsh lighting, hard shadows, digital textures, plastic, glossy, metallic, 3D rendered, CGI, neon colors, text, logos, multiple characters, busy background

Example: The Maiden

Here's how I built the protagonist's reference sheet:

8K, hyper-realistic photography of layered paper character reference sheet, hand-cut paper young Welsh maiden shown in front view and three-quarter view side by side, flowing moss-green dress with wildflower embroidery at neckline and hem, long dark wavy hair past shoulders with white meadow blossoms tucked behind each ear, gentle but determined expression, carries small clay pot with Celtic spiral pattern, simple leather shoes, detail callouts showing flower hair ornaments close-up and embroidery pattern and clay pot design, watercolor textures in soft greens and warm skin tones, visible paper fibers, plain warm cream background, stop-motion paper cut-out look, soft even diffused lighting, crafted collage aesthetic, character design reference sheet

Negative prompt:

photorealistic skin, modern clothing, harsh lighting, hard shadows, digital textures, plastic, glossy, metallic, 3D rendered, CGI, neon colors, text, logos, multiple characters, busy background

Consistency markers I tracked:

  • Moss-green dress with wildflower embroidery (ALWAYS visible)

  • White meadow blossoms in dark hair (both sides)

  • Small clay pot with Celtic spirals

  • Gentle but determined expression

  • Long dark wavy hair past shoulders

Write down your consistency markers. Reference them constantly. The moment you forget the clay pot has Celtic spirals, your visual continuity breaks.

Example: The Monster

Different character, same template logic:

8K, hyper-realistic photography of layered paper character reference sheet, hand-cut paper massive Welsh monster shown in front view and side view, iron-gray armored back plates like overlapping shields, ridge of dark jagged spines running from head to tail, bent grasshopper-like armored front legs with bulging scaled thighs, moss-green and stone-gray coloring throughout, glowing amber eyes, wide jaw with suggestion of teeth, long powerful tail, detail callouts showing armored plate texture close-up and spine ridge detail and amber eye glow, watercolor textures in iron grays and murky greens, visible paper fibers, plain dark slate background, stop-motion paper cut-out look, soft even diffused lighting, crafted collage aesthetic, creature design reference sheet

Negative prompt:

photorealistic, cute, friendly looking, modern elements, harsh lighting, hard shadows, digital textures, plastic, glossy chrome, 3D rendered, CGI, neon colors, text, logos, multiple creatures, red colors, fire breathing

Consistency markers:

  • Iron-gray armored back plates (ALWAYS visible)

  • Ridge of dark spines along back

  • Glowing amber eyes

  • Bent grasshopper-like armored front legs

  • Moss-green and stone-gray coloring

  • Long powerful tail

Notice I adjusted the negative prompt for the monster: no "cute," no "friendly looking," no "red colors," no "fire breathing." Negative prompts need to anticipate what the AI might default to for your subject.

Phase 2: Scene Keyframes

Scene keyframes are the static images that form the visual foundation of each scene. Think of them as the "paintings" that will come alive with subtle animation.

Some scenes need only one keyframe. Others need two keyframes showing a transformation (A/B). The monster emerging from the water? That's an A/B scene. The peaceful valley establishing shot? Single keyframe.

Base Image Template

8K, hyper-realistic photography of layered paper [SETTING], hand-cut paper [CHARACTERS], watercolor textures, visible paper fibers, dimensional stacked paper layers, stop-motion paper cut-out look, soft diffused lighting, gentle shadows between layers, crafted collage aesthetic, warm storybook tone

Cultural Modifiers

For this Welsh production, I added these to every scene prompt:

  • Celtic knotwork border patterns

  • Mossy greens, stone grays, heather purples

  • Weathered parchment textures

  • Spiral and triskele motifs

  • Misty atmospheric layers

  • Ancient oak and standing stone elements

Your story will have its own cultural vocabulary. Japanese folktale? Different palette, different motifs. West African legend? Completely different visual language. Build your modifier list before you start generating.

Standard Scene Negative Prompt

photorealistic, modern objects, harsh lighting, hard shadows, digital textures, plastic, glossy, metallic, 3D rendered, CGI, neon colors, text, logos, fast movement, camera shake, morphing, distortion

Single Keyframe Example: The Peaceful Valley

8K, hyper-realistic photography of layered paper Welsh mountain valley with misty purple hills and winding river, hand-cut paper thatched-roof cottages nestled among green slopes, tiny paper sheep dotting hillsides, smoke wisps rising from cottage chimneys, watercolor textures in mossy greens and heather purples, visible paper fibers, dimensional stacked paper layers creating atmospheric depth, stop-motion paper cut-out look, soft golden morning diffused lighting, gentle shadows between layers, crafted collage aesthetic, warm storybook tone, Celtic knotwork border pattern framing, standing stone silhouettes on distant hilltop, weathered parchment textures

A/B Keyframe Example: The Monster's Lair

Some scenes show transformation. The monster's lair needs two states: before and after the maiden arrives.

Keyframe A (Monster Waiting):

8K, hyper-realistic photography of layered paper dark mountain bog with murky green water and dead twisted trees, hand-cut paper massive Welsh monster partially submerged in water showing only iron-gray armored back and glowing amber eyes watching, mist rising from stagnant water, watercolor textures in murky greens and iron grays, visible paper fibers, dimensional stacked paper layers, stop-motion paper cut-out look, eerie diffused lighting from above, gentle shadows between layers, crafted collage aesthetic, tense atmosphere, Celtic spiral patterns in dark water ripples, weathered parchment textures, ancient standing stones barely visible in background mist

Keyframe B (Monster Emerging):

8K, hyper-realistic photography of layered paper dark mountain bog, hand-cut paper massive Welsh monster rising dramatically from murky water with iron-gray armored plates dripping and ridge of dark spines fully visible and bent grasshopper-like front legs raised menacingly, water splashing in paper layers around creature, amber eyes glowing intensely, same dead twisted trees and standing stones in background, watercolor textures in murky greens and iron grays, visible paper fibers, dimensional stacked paper layers, stop-motion paper cut-out look, dramatic lighting from above, gentle shadows between layers, crafted collage aesthetic, moment of terror, Celtic spiral patterns in disturbed water, weathered parchment textures

The A/B structure lets you animate between states while maintaining the paper cut-out aesthetic.

Phase 3: Scene Animations

Now we bring the keyframes to life. Veo 3.1 Fast in Adobe Firefly Boards handles this.

The goal: subtle movement that feels like stop-motion paper animation. Not cinematic sweeping shots. Not morphing transformations. Gentle, textured, handcrafted movement.

Animation Prompt Structure

For single keyframe scenes, the animation prompt focuses on environmental movement:

{
  "prompt": "Gentle environmental movement only. Paper smoke wisps drift lazily from cottage chimneys. Paper sheep shift slightly on hillsides. Misty layers drift slowly. All elements maintain paper cut-out texture. No character movement. No camera movement.",
  "negative_prompt": "fast movement, camera shake, morphing, distortion, smooth animation, 3D movement, character walking, running, dramatic action",
  "reference_image": "[Scene 1 keyframe]",
  "motion_intensity": "subtle",
  "duration": "5 seconds",
  "sound_design": "soft wind through valley, distant sheep bells, faint birdsong, crackling hearth fire undertone"
}

For A/B scenes, the animation transitions between states:

{
  "prompt": "Dramatic paper cut-out emergence. Monster rises from water in stop-motion style. Water splashes as layered paper shapes. Armored plates catch light as creature surfaces. Maintain paper fiber textures throughout. Movement should feel handcrafted, not fluid.",
  "negative_prompt": "smooth fluid motion, morphing transformation, camera shake, modern effects, CGI movement, fast action",
  "reference_image_start": "[Scene 8A keyframe]",
  "reference_image_end": "[Scene 8B keyframe]",
  "motion_intensity": "moderate",
  "duration": "5 seconds",
  "sound_design": "deep water churning, heavy splashing, low rumbling growl, dripping water echoes"
}

Critical Animation Guidelines

DO:

  • Keep motion subtle and environmental

  • Maintain paper textures throughout

  • Use stop-motion pacing (slightly stuttery, not smooth)

  • Let the scene breathe

DON'T:

  • Add camera shake

  • Create morphing transitions

  • Make characters move dramatically (save that for A/B transitions)

  • Speed up the motion

The Veo 3.1 Fast outputs also capture inherent audio. Keep this. Layer it with your sound design later.

Phase 4: Book Page Spreads

This is where the storybook framing comes together. Every scene animation sits inside an open book, viewed from above on a wooden art table.

You need a base book image first: an open storybook with blank pages, lying flat on a textured surface. Then you composite your scene keyframes onto the pages.

Base Book Prompt

8K, hyper-realistic photography of vintage open storybook lying flat on weathered wooden art table, cream-colored aged pages with subtle foxing and worn edges, hand-tooled leather cover with Celtic knotwork embossing visible on spine and corners, book lies completely flat and still, soft diffused overhead lighting, gentle shadows from book thickness, warm nostalgic atmosphere, no text on pages, pages ready for illustration

Book Spread Composite Prompt

{
  "prompt": "Open storybook on wooden art table, pages lying flat and completely still",
  "left_page": "replace with [KEYFRAME A], fill entire left page",
  "right_page": "replace with [KEYFRAME B], fill entire right page"
}

For my 11-scene production, I created 23 book spreads:

  • Opening spread (book closed, then opening)

  • Scene spreads (left page shows previous scene or decorative element, right page shows current scene)

  • Transition spreads

  • Closing spread

The spreads create visual rhythm. Viewers experience the story as pages in a physical book, not as disconnected AI-generated images.

Phase 5: Page Turn Animations

Page turns are the connective tissue. They signal scene transitions while reinforcing the book metaphor.

Kling 01 through Elevenlabs handles these.

Book Opening

{
  "prompt": "book cover opens to reveal first page"
}

Page Turn

{
  "prompt": "right page turn"
}

Book Closing

{
  "prompt": "book closes"
}

Kling understands the context from your input image (the book spread). The animations happen naturally.

For the production, I needed:

  • 1 book opening animation (cover to first spread)

  • 5 page turns between scenes

  • 1 book closing animation (final spread to closed cover)

Page Turn Guidelines

  • Keep them consistent. Same speed, same lighting, same table surface.

  • Don't over-stylize. The page turn is functional, not a showcase.

  • Audio from Kling captures the paper sound. Keep it.

Phase 6: Zoom Transitions

Zoom transitions move the viewer into and out of each scene. Camera dollies in from the book spread to the full-frame scene animation. After the scene plays, camera dollies out to reveal the book again.

This creates the "entering the story" and "returning to reality" rhythm that makes storybook videos feel immersive.

Zoom In Prompt

{
  "prompt": "Slow camera dolly in, static scene",
  "motion": "slow",
  "speed": "slow"
}

Zoom Out Prompt

{
  "prompt": "Slow camera dolly out, static scene",
  "motion": "slow",
  "speed": "slow"
}

For 11 scenes, I needed 22 zoom animations (one in, one out per scene).

Zoom Guidelines

  • Start the zoom on the book spread, with the scene visible on the right page

  • End the zoom in on the full-frame scene (no book visible)

  • Reverse for zoom out: start full-frame, end on book spread

  • Keep zoom speed consistent across all transitions

  • The scene should remain static during the zoom (Kling handles the camera movement)

Phase 7: Assembly in Premiere Pro

Now everything comes together. The assembly sequence:

  1. Book opening (closed book → first spread)

  2. Scene 1 sequence: Book spread with Scene 1 keyframe on right page
    Zoom in to Scene 1
    Scene 1 animation plays
    Zoom out to book spread

  3. Page turn (Scene 1 → Scene 2)

  4. Scene 2 sequence (same pattern)

  5. Repeat for all scenes

  6. Book closing (final spread → closed book)

Timeline Structure

[Book Open] → [Spread 1] → [Zoom In] → [Scene 1 Animation] → [Zoom Out] → [Page Turn] → [Spread 2] → [Zoom In] → [Scene 2 Animation] → [Zoom Out] → [Page Turn] → ... → [Book Close]

Sync Guidelines

Each scene animation should last long enough for its narration plus a beat of breathing room. Don't rush.

  • Start narration after the zoom-in completes

  • End narration before the zoom-out begins

  • Music runs continuous underneath everything

  • Layer scene SFX during narration gaps

Phase 8: Sound Design

Sound design has three layers: music, narration, and sound effects.

Music (Suno)

Generate a single continuous track that matches your story's emotional arc. For the Welsh folktale:

Celtic folk instrumental, gentle harp and wooden flute, building tension in middle section, peaceful resolution, 3 minutes, no vocals, storybook atmosphere, slight melancholy undertone

Don't generate separate tracks per scene. One continuous piece maintains emotional cohesion.

Narration (ElevenLabs)

Write your narration to match scene timing. Here's the narration breakdown for my production:

Total: 353 words

Choose a voice that matches your narrator persona. For folktales, warm and slightly aged voices work well. Avoid overly polished "audiobook narrator" tones. You want fireside storyteller, not studio professional.

Sound Effects

Two sources:

  1. Veo 3.1 inherent audio from scene animations (paper rustling, ambient texture)

  2. ElevenLabs sound effects for specific moments (monster roar, water splash, oxen chains)

Layer SFX during narration gaps. Don't compete with the voice.

Phase 9: Export and Publish

Export Settings

  • Format: H.264

  • Resolution: 4K (3840 x 2160) or 1080p (1920 x 1080)

  • Frame Rate: 24fps or 30fps

  • Audio: AAC, 320kbps

The Production Checklist

Use this to track your progress:

Why This Works

The workflow produces consistent, high-quality storybook animations because it maintains:

Visual consistency through character reference sheets and base prompt templates. Every scene looks like it belongs in the same world.

Smooth transitions through the book framing device. Zoom in, zoom out, page turns. Viewers never feel jarred between scenes.

Narrative cohesion through scene-by-scene narration synced to animations. The story carries the visuals, not the other way around.

Audio depth through layered music, SFX, and inherent video audio. Sound design makes paper feel tactile.

The key to efficiency is template reuse. The book spread prompt, zoom prompts, and base image template stay constant across all productions. Only the scene-specific content changes.

Once you've built this pipeline for one story, the second story takes half the time. The third takes a quarter.

Adapt This to Your Stories

This workflow isn't locked to Welsh folktales or paper cut-out aesthetics. The structure adapts:

Different visual styles:

  • Watercolor illustration (adjust base prompts, remove paper fiber references)

  • Woodblock print (different textures, different cultural modifiers)

  • Stained glass (dramatic lighting changes, color palette shifts)

Different story lengths:

  • Micro-stories (3-5 scenes, under 90 seconds)

  • Full picture books (15-20 scenes, 5-7 minutes)

  • Series episodes (consistent characters across multiple productions)

Different framing devices:

  • Scroll unfurling (instead of book pages)

  • Shadow puppet theater

  • Museum exhibit walk-through

The principles remain: character consistency through reference sheets, scene-by-scene keyframes, subtle animation, and a framing device that grounds the viewer.

Final Thoughts

AI-generated storybook videos sit in an interesting space. They're not quite animation. Not quite illustration. Not quite traditional video production.

That ambiguity is the opportunity.

Viewers don't have expectations yet. There's no established language for what these should look like or feel like. Every production is a chance to define the form.

Use this workflow as a starting point. Break it. Remix it. Find what I haven't discovered yet.

And when you make something that surprises you, I'd love to see it.

Created with Adobe Firefly, ElevenLabs, Suno, and Premiere Pro.

Keep Reading