Part 1. Introduction of production examples
First, please take a look at the works created using this technique.
(Video embedding assumed: https://youtu.be/aaCH-fbvzXw )
This short animation was created by linking the following tools.
- Idea, composition, image generation instructions: ChatGPT (GPT-4o)
- Character image generation: ChatGPT (DALL-E 3 functions)
- Video animation generation: Vidu AI video generator (Vidu 2.0)
- Specific production (floating): Vidu template "Balloon Me"
- Original BGM generation: SUNO
- Video editing (audio, subtitles, timing adjustment): Filmora
- High-quality final output: HitPaw VikPea
- Voice: Myself (Fairy Aya)
Part 2. Summary of the short animation produced: A fairy's tenmatsu: "Dumplings over flowers"

(Note: "Dumplings over flowers" is a Japanese proverb that means that dumplings (practical deliciousness) are preferred to beautiful flowers (beauty of appearance). It expresses a way of thinking that values practical benefits over aesthetics.)This video depicts a slightly surreal and humorous short story about Aya, a gluttonous fairy who is the epitome of "Dumplings over flowers." She becomes so engrossed in the delicious food at a cherry blossom viewing party that she eats so much that her body swells up like a balloon and eventually flies off into the sky.
(Note: "Cherry Blossom Viewing" is a traditional Japanese event where people gather to enjoy eating and drinking under cherry trees during the spring when the cherry blossoms bloom.)
The unexpectedness created by the gap between the cute character's appearance and their behavior is one of the most effective ways of expression, especially in short animations that capture the hearts of viewers in a short time on social media. We are now in an age where such concepts can be realized relatively easily and with high quality, even by individuals, by utilizing a group of AI tools.
Part 3. Overall creation flow and the role of each tool
This production roughly followed the following flow.
- Planning and composition: The story and cut divisions were decided while communicating with ChatGPT.
- Fixing character settings: ChatGPT memorized character information.
- Image material generation: ChatGPT was instructed to generate the images required for each cut.
- BGM generation: BGM that matched the atmosphere of the video was created with SUNO.
- Animation: Images were loaded into Vidu, and basic movements and special effects were applied using templates.
- Editing: Each material was combined with Filmora, narration, BGM, and subtitles were added, and the timing was adjusted.
- High-quality: HitPaw VikPea was used to improve the quality of the final output.
By combining the unique strengths of each tool and complementing them, animation production that previously required specialized skills and a lot of time can now be achieved by individuals.
Part 4. The core of ChatGPT usage: from planning to image generation
ChatGPT (GPT-4o) played a particularly central role in this production process. It was not just a text generation tool, but a partner in planning, a database of character settings, and an image generation instructor, acting as a command center that oversees the entire production.
Why ChatGPT? The importance of an integrated role
Maintaining consistency with conventional image generation AI was a major challenge when "generating multiple images of the same character with different compositions and expressions." It is common for the facial expression to change or the clothing to be slightly different. However, ChatGPT (especially after GPT-4o) has improved understanding and memory of context, and is good at giving continuous image generation instructions through dialogue while maintaining specific character settings.
In addition to image generation, it is also very useful in upstream production processes such as consulting on story composition and coming up with storyboard ideas. This makes it possible to seamlessly proceed through the series of steps from "Planning → Character Setting → Story Composition → Storyboard (Rough) → Image Generation Instructions" through dialogue with ChatGPT.
Step 1: Tips to ensure character consistency
The first thing we did was to define the specific characteristics of the character "Fairy Aya" and have ChatGPT clearly remember them. This will be the basis for maintaining the character's consistency in all subsequent processes.
An example of a character setting prompt actually conveyed to ChatGPT:
We will now discuss the production of a short animation and generate the necessary images. The main character is an original character called "Fairy Aya". When you call "Aya" in future instructions, be sure to depict and generate a character with the following characteristics.
[Characteristics of Fairy Aya]
- Hairstyle: Brown bob hair that does not reach her shoulders. Slightly inward curls.
- Face: Wears round glasses. Eyes are large, giving the impression of being curious.
- Clothing: Off-the-shoulder chocolate-colored dress. Small ribbon on the chest.
- Accessories: Beret with small flower decoration on the head (color should match the outfit).
- Other: Transparent blue butterfly-like wings on the back.
- Expression/personality: Generally cheerful and energetic. A bit of a foodie—eyes light up at the sight of delicious food. Also has a slightly clumsy side.
- Style: Cute, deformed, anime-style body.
Please remember this setting and reflect on it in future conversations and image generation instructions.
In this way, by telling ChatGPT not only specific appearance features but also personality and facial expression tendencies, ChatGPT can respond and generate images more consistently. It is important to tell ChatGPT to "remember these settings explicitly."After that, once you've created your illustration using image generation, attach the image and give the following instructions:
"We're going to create a 2D animation based on this Fairy Aya character. Are you ready?"

Step 2: Use ChatGPT to work out the outline of the story
After memorizing the character settings, I consulted ChatGPT about the story structure of the anime. I pitched the vague ideas in my head and asked them to break them down into concrete cuts.
An example of the first consultation prompt I gave to ChatGPT:
I want to make a short anime with the Fairy Aya, whom I have memorized as the main character.
The theme is "Dumplings over flowers."I'm thinking of a humorous story in which Aya goes to see the cherry blossoms, but ends up eating a lot of dumplings and bento boxes, ignoring the cherry blossoms, and in the end her stomach swells up like a balloon and flies away into the sky...I would like to make this into about 10 cuts, but what kind of scene division would be effective? Please propose a composition plan that will create an interesting flow, keeping in mind the beginning, development, turn, and conclusion.
In response to this question, ChatGPT presented the following composition plan.
Suggestions from ChatGPT (summary):
- (Beginning) Introduction: Aya arrives at a cherry blossom viewing venue where the cherry blossoms are in full bloom. Her expression is excited.
- (Beginning) Discovering her purpose: Sitting on a picnic sheet, her eyes light up when she sees the three-colored rice dumplings in front of her.
- (Development) Appetite in full swing: There are rice dumplings and many other delicacies lined up, such as bento boxes and sakura mochi.
- (Development) Start of binge eating: Aya has a happy expression, putting food into her mouth one after another.
- (Development) Accelerating appetite: She is so engrossed in eating that she doesn't even notice the cherry blossoms around her.
- (Subversion) Signs of something strange: She notices (or doesn't) that her stomach is bulging a little.
- (Subversion) Breaking through the limits: Her stomach is so full that it feels like it might burst. Her expression is one of surprise, rather than pain.
- (Subversion) Start of levitation: Her body starts to float gently. The moment her feet left the ground.
- (Closing) Into the sky: Aya slowly rises into the sky like a balloon. She flies through the flurry of cherry blossom petals.
- (Closing) Punchline: Aya looks troubled or resigned as she stands in the sky. A caption such as "You're eating too much" appears on the screen.
This proposal had a clear beginning, development, twist, and conclusion, and included visually interesting changes, so we decided to use it as a base for generating images for each cut. In this way, ChatGPT presents a composition plan and explains the intention and flow, allowing you to concretize your ideas as if you were bouncing ideas off a screenwriter or director.
Step 3: The "Image Generation Director" that materializes the scene
Once the composition is decided, ChatGPT creates prompts to generate images corresponding to each cut.
First, I made a rough composition that connected each scene.

I liked this rough composition more than I expected, so I used it as the cover of this video.
After this, I generated images from the first cut onwards.
The prompt is below."I like it. The rough is fine like that, but for the real thing, make it a delicate Japanese anime picture with beautiful light contrast and descriptive power. Now, the first one."It's a very simple instruction.simple-instructionSimilarly, by continuing the dialogue in the form of "Please make the next cut" and "For the second cut, Aya should start eating the dumplings and bring them to her mouth with happy eyes," ChatGPT will create prompts one after another to generate a series of scene images that follow the settings. This dramatically streamlines the preparation of continuous illustration materials, which was previously time-consuming.
Part 5. Animate with Vidu: The magic of templates
Vidu is a video generation AI that brings life to a series of still images prepared by ChatGPT. Vidu excels at generating short video clips from text and images, and provides particularly powerful features for character animation.

Overview and basic usage of Vidu 2.0
Vidu creates videos with natural movements and camerawork based on input images and text. The basic usage is simply to upload the generated still images and specify in text what kind of movement you want to add, or select a prepared template.
Example of using the "Balloon Me" template: Easily create complex expressions
The "Balloon Me" template preset in Vidu was particularly effective in this production. This is a function that automatically generates an animation in which the input image (character) slowly inflates like a balloon and floats up to the top of the screen.

The story's climax, where the character overeats and swells up, flying into the sky, would require considerable skill and time to express by hand or in 3D. However, with Vidu's "Balloon" template, you can simply apply a still image of Aya (an illustration of her slightly bloated stomach) to the template, and a natural floating animation will be completed in one minute.

Just insert one image and click the "Create" button!
The template not only moves the character, but also automatically adds subtle changes in facial expressions, body swaying, and even movements as if the camera is chasing the character, making it easy to create a richer and smoother animation than you could have imagined.
The possibilities of Vidu templates: Expanding the range of expression

Vidu has templates besides "Balloon Me" that allow you to easily realize various movements. For example, there are "Fly Me", "Orbit", "Push-in", "Nap Me", etc., and by combining these, you can richly express the character's daily scenes, action scenes, emotional expressions, etc.
These templates strongly support the characters' charm and the story's dynamism, which cannot be conveyed by still images alone. I am always amazed at the expressiveness, thinking "Just adding a little movement can make it so lively."
Part 6. Other tools to improve quality
ChatGPT and Vidu are the core of the production, but collaboration with other dedicated tools is also essential to improve the quality of the final work.
SUNO: Original BGM generation by AI

For the BGM, which is an important element that determines the atmosphere of the video, we used the AI composition tool "SUNO". SUNO generates original music just by entering the image of the song in text (genre, atmosphere, instruments used, tempo, etc.).
Example of prompt actually used with SUNO:
Spring, picnic, cherry blossoms, cheerful, funny
This generated a light and cute Japanese-style fantasy-like BGM that matched the idyllic atmosphere of the cherry blossom viewing and Aya's slightly strange situation, enhancing the texture of the entire video.
Filmora: Actual editing process

The generated images, animation clips by Vidu, BGM by SUNO, and narration recorded by myself. To combine these materials into one and complete the final video, we used the video editing software "Filmora".In Filmora, we did the following:
- Assembling the sequence: Place still images and video clips for each cut on the timeline according to the proposed composition.
- Audio editing: Adjust the volume balance of the narration and background music and insert them at the appropriate time.
- Inserting titles: Use a template.
- Length adjustment: Adjust the overall tempo and summarize it to a length that is easy to watch.
The intuitive interface makes it relatively easy to edit professionally.
HitPaw VikPea: Final quality improvement
Images and videos generated by AI sometimes lack resolution or contain some noise. Therefore, we used the AI quality improvement tool "HitPaw VikPea" as a preliminary step to the final output.
This tool can improve the resolution of videos and remove noise using AI algorithms. In this case, we loaded the generated video material and performed upscaling processing up to 4K equivalent to achieve clearer and sharper video quality. This extra step is effective in giving the impression of high quality when publishing on platforms such as YouTube.
Part 7. The whole production flow: Creative process in the age of AI
The process so far can be summarized as follows:
- Planning & dialogue (ChatGPT): Idea generation, story composition, character setting.
- Material generation (ChatGPT + image generation AI): Generate images for each cut based on character settings.
- Sound generation (SUNO): Generate background music that matches the atmosphere of the video.
- Animation (Vidu): Give movement to still images and add special effects with templates.
- Editing (Filmora): Integrate all materials and adjust audio and subtitles.
- Quality improvement (HitPaw VikPea): Improve the image quality of the final output.
- Release: Distribute the completed video on social media, etc.
The fact that this series of steps can now be achieved at the individual level, without special expertise or a large team, and in a relatively short period of time (one to several days once you get used to it), can be said to be a major revolution in the creative field brought about by AI. In fact, I created this work in less than two hours.