Language
Try Vidu

Create an AI animation of "3 characters x 1 scene" with Vidu's multi-reference consistency

Hello! Today I would like to write about this.

"When you create an anime-like video with AI, the characters become different people for each frame."

Every time I hear such a lament, I nod so hard that my neck breaks. However, today, the situation has changed dramatically.

Vidu has officially released Multi-Reference Consistency, which loads up to seven reference images at the same time and matches tagged images for each frame to maintain consistency, making it easier to achieve consistency for multiple characters at once. However, Vidu's Reference to Video is currently limited to generating videos that are only 4 seconds long at a time. Therefore, we will verify the steps to create a 30-second PV using the idea of ​​"8 x 4-second clips".

Part 1. Set the theme and design characters

1-1 Why a rooftop battle?

The setting is a rooftop adorned with glowing neon signs. There are three reasons for choosing this location:

1. It allows both horizontal and vertical movement

The added height variation brings dynamic action, keeping even 4-second clips visually engaging.

2. The background can be handled with a single image

A helipad floor and distant skyline—once these two elements are drawn, the camera movement won’t feel unnatural. This means more reference slots can be dedicated to characters.

3. Lighting effects are easy to achieve

A nighttime cityscape with neon makes it simple to add rim lighting along the edges of characters’ hair, which pairs beautifully with cel shading.

1-2 Character Design Sheet

Name
Age
Key Colors
Fixed Items
Silhouette
Akari
20
Hair #C8A2FF + Star Emblem #FFD700
Silver flight jacket, gold star on left chest
Slim, scar on one eyebrow
Ryu
22
Hair #FF3030 + Sword #FFD700
Black trench coat, sword with pulsing red LEDs
Inverted triangle body type
Pixie
-
Core #00E0FF + Wings #FFFA00
30 cm spherical drone, 4 holographic wings
Sphere with wings

I am often asked, "Is it okay to just give the color a similar name?", but if you write down the HEX code, it seems that the color fluctuations in Midjourney are greatly reduced. Roughly speaking, it's like "specifying the paint number and giving it to them."

Part 2. Creating references with Midjourney Niji 6

Niji 6 is an anime-specialized model known for its clean line art and vibrant colors. While v7.0 is also excellent, its prompt handling can be trickier, so for this tutorial, we’ll use Niji 6.

2-1 Akari (Three-View Reference)

Prompt: imagine PG-13 character reference, young adult woman (age 20) cyber-punk heroine,

front view centered, modern TV-anime style, thin colored line art, variable outline,

two-tone cel shading, short lavender hair with magenta streaks, silver flight jacket with gold star patch,

neutral grey background --ar 3:4 --stylize 150 --niji 6 --seed 777

  • BAN Warning: Initially, using terms like "girl, crop top" triggered an automatic ban for "sexualized depiction of a minor."
  • Workaround: Declaring the character as a young adult and changing clothing to a fitted tee allowed the prompt to go through safely.

Although the goal was to generate a front view, a full three-view reference was output. This image will be used to generate further poses.

Midjourney operates on Discord. Right-click the output image and select “Copy Link” to proceed with further use.

2-2 Akari (Left Profile)

By inserting the previously copied image link into <Akari_front_URL>, you can generate a consistent character design and art style across different views.

Prompt: imagine profile left view, facing right, modern TV anime style, colored thin line art, two-tone cel shading, soft pastel palette, shallow depth of field, neutral grey BG, modest clothing, PG-13 --cref <Akari_front_URL> --sref <Akari_front_URL> --ar 3:4 --stylize 150 --niji 6

  • Cref (Character Reference): Maintains facial features and character form.

Sref (Style Reference): Maintains overall art style and visual tone (e.g., oil painting, manga style, etc.).

2-3 Ryu (Front Base Pose)

Prompt: imagine PG-13 character reference, young adult man (age 22) cyber-samurai, front view, modern TV anime style, colored thin line art, two-tone cel shading, short crimson hair, black tech trench coat, glowing gold katana held down, neutral grey studio BG, symmetrical stance --ar 3:4 --stylize 150 --niji 6 --seed 888

This prompt generates a symmetrical front-facing base image of Ryu in a cyber-samurai style, featuring clear anime-style line art, bold cel shading, and a neutral background suitable for character reference use.

2-4 Ryu (Action Pose)

Prompt: imagine aerial downward slash pose, coat fluttering, red energy trails, cinematic dusk rooftop, anime style, variable-width outline, two-tone cel shading, gentle bloom, PG-13 --cref <Ryu_front_URL> --sref <Ryu_front_URL> --ar 16:9 --stylize 200 --niji 6

This prompt generates an action scene of Ryu in mid-air performing a downward slash. His coat flows dramatically with red energy trails against a cinematic dusk rooftop background. By using both --cref and --sref, the character design and visual style remain consistent with the base reference image.

2-5 Pixie (Three-View Reference)

Prompt:imagine orthographic reference sheet, hovering spherical drone mascot, diameter 30 cm, teal alloy body, central camera lens, four holographic wings emitting soft yellow light, modern anime cel shading, colored thin line art, neutral grey studio BG --ar 1:1 --stylize 120 --niji 6

This prompt generates a three-view (orthographic) reference sheet of Pixie, a 30 cm-wide floating spherical drone mascot. It features a teal alloy body, central camera lens, and four soft yellow holographic wings, all rendered in clean anime-style cel shading with thin colored lines. Ideal for use in consistent character modeling and animation planning.

2-6 Background (16:9)

Prompt: imagine cinematic rooftop helipad at dusk, neon-lit skyline, soft fog layers,

2.5D anime background, thin colored line art, two-tone cel shading,

no characters --ar 16:9 --stylize 250 --niji 6 --seed 12345

Note: Including no characters is essential. If omitted, random passersby may appear in the scene, which can confuse the Multi-Reference system and compromise character consistency.

2-7 4 Directions + Actions

Add multiple images for each: Akari, Ryu, and Pixie, such as "left profile," "back view," "running (or cutting/rotation)," etc. By using --cref <Akari_front_URL> to refer to the base image and explicitly stating "hair remains lavender," color discrepancies will be minimized.

The basic front view + 2 auxiliary views (side and back) make up one set, and Vidu AI video generator allows you to upload a maximum of 3 images at the same time. For running poses, generating with a 16:9 aspect ratio will prevent "limb clipping."

After generating the images, download them and organize the file names like this:

Akari_front.png

Akari_profile.png

Ryu_front.png

BG_rooftop.png

This will make tagging in Vidu easier.

Part 3. Create A 4-Second Clips in Vidu

3-1 Basic Settings in Vidu

  • Mode: Reference to Video
  • Clip Duration: 4 Seconds
  • Resolution: Speed or 720p

If you select "Speed" for the resolution, you can click the HD button in the top right of the thumbnail after generation to upscale to 1080p.

3-2 How to add My References

1. You can register up to three images at once using the My References button.

2. The order in which they are added will automatically determine the reference order, so place the most important front core at the top.

3. Set the Reference Name.

Enter a name that is easy to distinguish, such as the character name or pose name.

```

@SCENE_BG

@Akari_front

@Ryu_front

@Pixie_front

@Akari_sheet ... (auxiliary)

@Ryu_sheet ... same

```

4. Check the Style.

If you want to add more depth, try changing it to 3D Rendering or 2.5D Animation.

3-3 List of 8 Prompts

# C1 0-4s: Introduction

@SCENE_BG drone rise, neon skyline dusk, gentle bloom, PG-13

# C2 4-8s: Akari Running

@SCENE_BG @Akari_front @Akari_profile enters left sprinting, lavender trail, dolly-in, rim light

# C3 8-12s: Ryu Landing

@SCENE_BG @Ryu_front descends from the sky, gold katana sparks, tilt-down, dust puff

# C4 12-16s: Pixie Joins

@SCENE_BG @Pixie_front hovers center, teal holo-wings pulse, zoom-in 120→80 mm

# C5 16-20s: Confrontation

@SCENE_BG @Akari_front @Ryu_front face-off, jackets flutter, static 50 mm lens

# C6 20-24s: Slow-Motion Dramatic Shot

@SCENE_BG slow-motion 0.5×, lavender vs gold energy arcs, handheld 3%

# C7 24-28s: Collision

@SCENE_BG impact shockwave, cyan-magenta grade, handheld 5%, debris particles

# C8 28-32s: Pull-Back End

@SCENE_BG camera pulls back, sunset sky fills frame, silhouettes, gentle bloom

Quick Explanation of Camera Terms

  • dolly-in: An image where the camera zooms forward on rails.
  • tilt-down: The camera tilts downward.
  • handheld 3%: Slight camera shake to add realism.

When typing in the prompt input field, entering @ will automatically display reference suggestions.

3-4 Design Concept of Movement and Camerawork

1. Horizontal movement + vertical movement + rotation are switched every 4 seconds to add contrast.

2. If you write the camera lens value as a "guideline for distance," Vidu reproduces the perspective surprisingly faithfully.

3. Slow motion is entered at "0.5x." The reason is that the image quality will decrease if you add length in post-editing, so it will be smoother if you set it to slow motion on Vidu from the beginning.

Part 4. Sound effects and BGM - Vidu AI Sound Effects × Suno AI

4-1. What is Vidu AI Sound Effects?

A new feature added by Vidu in April 2025 is a tool that generates sound effects just by entering text and a timestamp. If you write something like "0-2 s: wind" or "2-4 s: sword clash," it will create a multi-layer according to the number of seconds.

AI Sound Effects Quick Start Guide
  • Vidu's AI Sound Effects use the format [0-1s]: wind whoosh for time-based instructions in seconds.

Example: C7 Collision SE

Click the Timestamp button before entering the text. You can adjust the length of the sound effect on the Sets total duration screen at the bottom.

https://www.vidu.com/share/2752563312369487/807534

In the past, we would search through sound effect websites and manually match waveforms, but with this feature, you can complete a clip in just a few seconds.

4-2 BGM with Suno AI

Suno AI is a music generation service that creates 2-3 minute songs from just text input.

While a detailed explanation is omitted, you can generate background music by turning on "Instrumental" in the Suno AI Create screen and simply entering text for the "Style."

Style Example

Japanese anime rock, uplifting Instruments: cinematic synthwave, pulsing bass arpeggio, punchy electronic drums

https://suno.com/song/48c1d3e0-3c4a-40f1-b29a-6910af0553e5?sh=3kMOrCZFOaSiQhbR

90% of the sound is completed using Vidu's in-house sound effects and Suno AI music. The remaining 10% is spiced up with EQ and limiters in Filmora and Resolve, and a 30-second animation can be completed without any rework.

Part 5. FAQ and Real-Life Failure Stories

1. Hair color changes midway

  • Issue: Only the front view image was registered as a reference.
  • Solution: Register both front and side views, and place the front image at the top of the tag order (Vidu prioritizes based on tag order).

2. Face appears in back shot

  • Issue: Forgot to write "face not visible" in the back view prompt.
  • Solution: Add "face hidden, back view, no facial details" to the back view prompt.

3. Camera speed jumps at the 4-second boundary

  • Issue: Missing camera verb for each clip.
  • Solution: Ensure that every clip includes specific verbs like "dolly-in" or "tilt-down."
Conclusion

The 4-second limit is not a "constraint" but rather an editing point. If you approach it this way, Vidu becomes a high-speed feedback tool that allows you to "think → instantly see results." You can register up to 3 reference images, but a 2-image setup with front and side views is more practical. After generation, simply click the HD button to upscale to 1080p in one click.

That being said, the UI and pricing of AI tools change daily. Please check the official documentation, and don't be afraid to try things out, even if you fail. I look forward to seeing your original AI animations on your timeline next!

Toki
By Toki
I do creative things using generative AI. SNS (X): @toki_mwc
Related Articles
blogFixedRight
Vidu
The best AI video generator delivering high-quality results in seconds.
CREATE NOW
Top