Create an AI animation of "3 characters x 1 scene" with Vidu's multi-reference consistency
Create an AI animation of "3 characters x 1 scene" with Vidu's multi-reference consistency
Hello! Today I would like to write about this.
"When you create an anime-like video with AI, the characters become different people for each frame."
Every time I hear such a lament, I nod so hard that my neck breaks. However, today, the situation has changed dramatically.
Vidu has officially released Multi-Reference Consistency, which loads up to seven reference images at the same time and matches tagged images for each frame to maintain consistency, making it easier to achieve consistency for multiple characters at once. However, Vidu's Reference to Video is currently limited to generating videos that are only 4 seconds long at a time. Therefore, we will verify the steps to create a 30-second PV using the idea of "8 x 4-second clips".
Part 1. Set the theme and design characters
1-1 Why a rooftop battle?
The setting is a rooftop adorned with glowing neon signs. There are three reasons for choosing this location:
1.It allows both horizontal and vertical movement
The added height variation brings dynamic action, keeping even 4-second clips visually engaging.
2. The background can be handled with a single image
A helipad floor and distant skyline—once these two elements are drawn, the camera movement won't feel unnatural. This means more reference slots can be dedicated to characters.
3. Lighting effects are easy to achieve
A nighttime cityscape with neon makes it simple to add rim lighting along the edges of characters' hair, which pairs beautifully with cel shading.
1-2 Character Design Sheet
Name
Age
By Toki
I do creative things using generative AI.
SNS (X): @toki_mwc
The best AI video generator delivering high-quality results in seconds.
Create Now
Vidu
The best AI video generator delivering high-quality results in seconds.
Create Now
Create Now
Top
Key Colors
Fixed Items
Silhouette
Akari
20
Hair #C8A2FF + Star Emblem #FFD700
Silver flight jacket, gold star on left chest
Slim, scar on one eyebrow
Ryu
22
Hair #FF3030 + Sword #FFD700
Black trench coat, sword with pulsing red LEDs
Inverted triangle body type
Pixie
-
Core #00E0FF + Wings #FFFA00
30 cm spherical drone, 4 holographic wings
Sphere with wings
I am often asked, "Is it okay to just give the color a similar name?", but if you write down the HEX code, it seems that the color fluctuations in Midjourney are greatly reduced. Roughly speaking, it's like "specifying the paint number and giving it to them."
Part 2. Creating references with Midjourney Niji 6
Niji 6 is an anime-specialized model known for its clean line art and vibrant colors. While v7.0 is also excellent, its prompt handling can be trickier, so for this tutorial, we'll use Niji 6.
2-1 Akari (Three-View Reference)
Prompt: imagine PG-13 character reference, young adult woman (age 20) cyber-punk heroine,
front view centered, modern TV-anime style, thin colored line art, variable outline,
two-tone cel shading, short lavender hair with magenta streaks, silver flight jacket with gold star patch,
BAN Warning: Initially, using terms like "girl, crop top" triggered an automatic ban for "sexualized depiction of a minor."
Workaround: Declaring the character as a young adult and changing clothing to a fitted tee allowed the prompt to go through safely.
Although the goal was to generate a front view, a full three-view reference was output. This image will be used to generate further poses.
Midjourney operates on Discord. Right-click the output image and select “Copy Link” to proceed with further use.
2-2 Akari (Left Profile)
By inserting the previously copied image link into <Akari_front_URL>, you can generate a consistent character design and art style across different views.
Prompt: imagine profile left view, facing right, modern TV anime style, colored thin line art, two-tone cel shading, soft pastel palette, shallow depth of field, neutral grey BG, modest clothing, PG-13 --cref <Akari_front_URL> --sref <Akari_front_URL> --ar 3:4 --stylize 150 --niji 6
Cref (Character Reference): Maintains facial features and character form.
Sref (Style Reference): Maintains overall art style and visual tone (e.g., oil painting, manga style, etc.).
2-3 Ryu (Front Base Pose)
Prompt: imagine PG-13 character reference, young adult man (age 22) cyber-samurai, front view, modern TV anime style, colored thin line art, two-tone cel shading, short crimson hair, black tech trench coat, glowing gold katana held down, neutral grey studio BG, symmetrical stance --ar 3:4 --stylize 150 --niji 6 --seed 888
This prompt generates a symmetrical front-facing base image of Ryu in a cyber-samurai style, featuring clear anime-style line art, bold cel shading, and a neutral background suitable for character reference use.
This prompt generates an action scene of Ryu in mid-air performing a downward slash. His coat flows dramatically with red energy trails against a cinematic dusk rooftop background. By using both --cref and --sref, the character design and visual style remain consistent with the base reference image.
2-5 Pixie (Three-View Reference)
Prompt:imagine orthographic reference sheet, hovering spherical drone mascot, diameter 30 cm, teal alloy body, central camera lens, four holographic wings emitting soft yellow light, modern anime cel shading, colored thin line art, neutral grey studio BG --ar 1:1 --stylize 120 --niji 6
This prompt generates a three-view (orthographic) reference sheet of Pixie, a 30 cm-wide floating spherical drone mascot. It features a teal alloy body, central camera lens, and four soft yellow holographic wings, all rendered in clean anime-style cel shading with thin colored lines. Ideal for use in consistent character modeling and animation generation planning.
2-6 Background (16:9)
Prompt: imagine cinematic rooftop helipad at dusk, neon-lit skyline, soft fog layers,
2.5D anime background, thin colored line art, two-tone cel shading,
no characters --ar 16:9 --stylize 250 --niji 6 --seed 12345
Note: Including no characters is essential. If omitted, random passersby may appear in the scene, which can confuse the Multi-Reference system and compromise character consistency.
2-7 4 Directions + Actions
Add multiple images for each: Akari, Ryu, and Pixie, such as "left profile," "back view," "running (or cutting/rotation)," etc. By using --cref <Akari_front_URL> to refer to the base image and explicitly stating "hair remains lavender," color discrepancies will be minimized.
The basic front view + 2 auxiliary views (side and back) make up one set, and Vidu AI video generator allows you to upload a maximum of 3 images at the same time. For running poses, generating with a 16:9 aspect ratio will prevent "limb clipping."
After generating the images, download them and organize the file names like this:
Akari_front.png
Akari_profile.png
Ryu_front.png
…
BG_rooftop.png
This will make tagging in Vidu easier.
Part 3. Create A 4-Second Clips in Vidu
3-1 Basic Settings in Vidu
Mode: Reference to Video
Clip Duration: 4 Seconds
Resolution: Speed or 720p
If you select "Speed" for the resolution, you can click the HD button in the top right of the thumbnail after generation to upscale to 1080p.
3-2 How to add My References
1. You can register up to three images at once using the My References button.
2. The order in which they are added will automatically determine the reference order, so place the most important front core at the top.
3. Set the Reference Name.
Enter a name that is easy to distinguish, such as the character name or pose name.
```
@SCENE_BG
@Akari_front
@Ryu_front
@Pixie_front
@Akari_sheet ... (auxiliary)
@Ryu_sheet ... same
```
4. Check the Style.
If you want to add more depth, try changing it to 3D Rendering or 2.5D Animation.
dolly-in: An image where the camera zooms forward on rails.
tilt-down: The camera tilts downward.
handheld 3%: Slight camera shake to add realism.
When typing in the prompt input field, entering @ will automatically display reference suggestions.
3-4 Design Concept of Movement and Camerawork
1. Horizontal movement + vertical movement + rotation are switched every 4 seconds to add contrast.
2. If you write the camera lens value as a "guideline for distance," Vidu reproduces the perspective surprisingly faithfully.
3. Slow motion is entered at "0.5x." The reason is that the image quality will decrease if you add length in post-editing, so it will be smoother if you set it to slow motion on Vidu from the beginning.
Part 4. Sound effects and BGM - Vidu AI Sound Effects × Suno AI
4-1. What is Vidu AI Sound Effects?
A new feature added by Vidu in April 2025 is a tool that generates sound effects just by entering text and a timestamp. If you write something like "0-2 s: wind" or "2-4 s: sword clash," it will create a multi-layer according to the number of seconds.
In the past, we would search through sound effect websites and manually match waveforms, but with this feature, you can complete a clip in just a few seconds.
4-2 BGM with Suno AI
Suno AI is a music generation service that creates 2-3 minute songs from just text input.
While a detailed explanation is omitted, you can generate background music by turning on "Instrumental" in the Suno AI Create screen and simply entering text for the "Style."
Style Example
Japanese anime rock, uplifting Instruments: cinematic synthwave, pulsing bass arpeggio, punchy electronic drums
90% of the sound is completed using Vidu's in-house sound effects and Suno AI music. The remaining 10% is spiced up with EQ and limiters in Filmora and Resolve, and a 30-second animation can be completed without any rework.
Part 5. FAQ and Real-Life Failure Stories
1. Hair color changes midway
Issue: Only the front view image was registered as a reference.
Solution: Register both front and side views, and place the front image at the top of the tag order (Vidu prioritizes based on tag order).
2. Face appears in a back shot
Issue: Forgot to write "face not visible" in the back view prompt.
Solution: Add "face hidden, back view, no facial details" to the back view prompt.
3. Camera speed jumps at the 4-second boundary
Issue: Missing camera verb for each clip.
Solution: Ensure that every clip includes specific verbs like "dolly-in" or "tilt-down."
Conclusion
The 4-second limit is not a "constraint" but rather an editing point. If you approach it this way, Vidu becomes a high-speed feedback tool that allows you to "think → instantly see results." You can register up to 3 reference images, but a 2-image setup with front and side views is more practical. After generation, simply click the HD button to upscale to 1080p in one click.
That being said, the UI and pricing of AI tools change daily. Please check the official documentation, and don't be afraid to try things out, even if you fail. I look forward to seeing your original AI animations on your timeline next!