Hello! I'm Fairy Aya, an AI creator working in Japan.

I use AI to explore new ways of expression and share the joy of using it. The evolution of AI is remarkable, and I am excited by the new possibilities that are emerging every day.
One day, I received an unbelievable notification on my smartphone. It was the news that Piko Taro (Kosaka Daimaou), the man who once created a worldwide frenzy with his Apple Pen (PPAP), had followed me on social media accounts.

For a moment, I wondered if there was some kind of mistake.
That's because I'd lived a life that had nothing to do with music up until that point. To be more specific, I'm so tone-deaf that I wonder if the concept of pitch is completely missing from my vocabulary. I never dreamed that someone like me would one day be recognised by a global legend.
What was even more surprising was that Piko Taro recognised me not just as an AI creator, but as "someone who makes music."

Here is the actual music video we produced.
https://youtu.be/cDfTc8MhidgThis is by no means my own personal achievement. This event is a clear testament to the "amazing times" of our modern age. With the help of the almost magical technology of generative AI, anyone can take their first steps as a creator, even without special talent or years of training. Such an era is already upon us.
In this article, I will reveal the process of how I, who had zero musical experience, or rather started from a negative point of view, created an original music video (MV) with the help of generative AI.
Part 1. Starting point - Even if you're tone deaf, you can manage with AI

1-1. With AI, you can take on the challenge even if you don't have talent
As I mentioned at the beginning, I have absolutely no confidence in my singing voice. I'm what they call "tone deaf," and because I'm aware of that, I felt reluctant to sing in front of people. I like karaoke, but I only do it solo. Of course, I have no experience composing music. I can't read music, and I can't play any instruments. In terms of musical talent, I started from a point that was as close to negative as possible.
But then an uncontrollable desire welled up within me. It was an ambition to "make my own original music video!"
In my head, I had a vague image of a world that was sparkling and cute, but also a little dark and addictive.

But how?
How can I, who is deaf and has no experience in composing, create something like that?
The existence of "generative AI" made this possible. I first decided to consult ChatGPT, the AI assistant closest to me.
"Hey ChatGPT. I want to make a pop, cute, but slightly poisonous, highly addictive music video. What do you think?"
When I asked with anticipation, ChatGPT, like an excellent counselor, carefully listened to my vague image and led me to a concrete idea. From this moment, my "Kawaii Monster" project started moving.
1-2. Tools to spread the wings of creativity
To realize a grand project such as MV production, it is essential to choose the right tools. For each process, I selected AI tools that would bring out the most of my creativity and that I, who do not have specialized knowledge, could handle. Below, I will introduce the tools I actually used this time, their purpose, the reasons for my selection, and a brief supplement.
Purpose | Tools | Additional information/Reasons for selection |
---|---|---|
Brainstorming & Lyrics | ChatGPT (GPT-4) | Dig deeper into ideas through dialogue and generate lyrics. Because you can discuss in a natural conversational format, it's easy to organize your thoughts and come up with creative ideas. The voice dialogue mode, in particular, is attractive because it allows you to bounce ideas off the wall without slowing down your thinking. |
Composition | Suno AI | By simply entering text prompts (short sentences indicating the mood and genre of the song), original music is generated in high-quality audio. The decisive factor was that it can be operated intuitively even without special musical knowledge, and allows users to prototype music in a variety of genres. |
Character Design & Artwork | Midjourney | It generates highly detailed and original images based on text prompts. This time, we used it to create hyper-realistic key visuals and character images that matched the concept of the song. We were attracted to its high expressiveness and the way it allows us to create even the smallest details. |
Video Generation | Vidu / HeyGen / Higgsfield | Vidu: Mainly converts still images for each scene (generated by Midjourney) into seamless videos. The Image to Video and Reference functions (mimicking the style of reference images and videos) are particularly powerful. HeyGen: Mainly used for lip syncing (generating character mouth movements to match lyrics). A notable feature is its ability to achieve natural mouth movements. Higgsfield: Inserts images into existing templates to generate short video clips with distinctive effects and transitions added. Used as an accent. |
Editing and final adjustments | Filmora (Wondershare) | Used to combine the generated video clips, cut and edit, add effects, and give the MV its final touch. The intuitive interface makes it appealing, as even beginners can perform relatively advanced editing. |
Each of these tools has amazing capabilities in a specific field. The important thing is not to stick to one tool, but to combine the best tools for your purpose and make the most of each tool's strengths. Just like an orchestra conductor, by understanding and harmonizing the characteristics of each instrument (tool), you can create richer and deeper works.
Part 2. Solidifying the concept - The story behind the birth of "Kawaii Monster"

The heart of MV production is the "concept" that determines the direction of the entire work. I aimed for a world view that was not just cute, but also had a spice of "poison" and "madness" hidden in the depths, and an addictive song and video that you would never forget once you saw it.
2-1. Source of inspiration and keyword sketch
I usually like to watch fashionable and sophisticated music videos released by Korean artists (K-POP idols). Their MVs are a high-level fusion of all elements, including musicality, visual beauty, costumes, dance, and storyline, and can be enjoyed as a comprehensive work of art. In particular, I was strongly attracted to the multifaceted charm of groups such as BLACKPINK, which combines cuteness, coolness, and sometimes dark elements. From there, I got the idea of "characters that are not just cute, but have something hidden".
In order to turn this vague image into a concrete concept, I sketched out several keywords and combined them to build a world view.
- Cute × Grotesque Gap Moe:
On the surface, she is a sweet and cute girl. But inside, she hides an unpredictable, somewhat terrifying "monster" side. We thought that this strong contrast would be the hook that would capture the hearts of the viewers. It's an immoral charm, like being lured in by the sweet nectar, only to find that it's a terrifying trap.
- Love Game with Reversed Dominance:
In love, a girl who appears weak is the one who skillfully manipulates and dominates the other person. At first glance, she has a cuteness that makes men think, "I want to protect her," but in reality, she has the power to make the other person fall in love with her and control them. This "reversed master-servant" theme was also an important element in the lyrics and visual expression.
- A fusion of the sophisticated and fashionable atmosphere of Korean music videos and Japanese "Kawaii" culture:
While incorporating the stylish and cool visual beauty of K-POP music videos, I wanted to incorporate plenty of "Kawaii" elements unique to Japanese pop culture into the character designs and lyrics. However, I wanted to create something original that added my own interpretation rather than simply imitating.
With these keywords in mind, I began a dialogue with ChatGPT again. I presented ChatGPT with a group of phrases that would evoke more specific images.
"Hey ChatGPT, what do you think of some lyrics with this kind of image?
- <Sweet Trap>: A temptation that attracts the listener, but also has a dangerous scent.
- <The true nature of a cute girl as a monster>: An unpredictable and dark side hidden within a girl who appears pure and innocent.
- <A catchy refrain>: An addictive repeating phrase that you can't get out of your head once you hear it."
When I told ChatGPT these keywords, it immediately suggested various ideas. Among them, the one that struck me was the catchy and unforgettable phrase "Suki Suki Love Love / Getchu Getchu".
"Oh, this is cute! And it's really catchy!"
I was intuitively convinced that this would be a powerful hook for the song. Let's expand the worldview of the lyrics with this phrase as the center. Thus, the skeleton of the core concept of the project, "Kawaii Monster", took shape with a clear outline. It was like an unpredictable little devil wearing a mask of cuteness. I felt that this unbalanced charm was the greatest weapon of this song.
Part 3. Lyrics × ChatGPT - The magic of words polished in a conversational format
Once the concept was solidified, the next step was to create the lyrics. Here again, my trusty partner ChatGPT was a big help. What I found particularly useful was ChatGPT's "voice mode (conversational mode)".
3-1. The power of voice dialogue mode - creative flow that doesn't stop thinking
Typing on a keyboard can sometimes create pressure to "write properly" or "construct logically", which can hinder free thinking.
However, by using voice mode, you can convey the words and ideas that come to your mind to the AI as if you were chatting with a friend. You can spin words one after another without slowing down your thinking speed, making songwriting more intuitive and speedy.
The things I actually spoke to ChatGPT about were quite miscellaneous and not well organized.
"Well, how can I put it... Well, the main character is a cute girl. She's super cute, and all the boys are smitten by her cuteness. But actually, deep down, she's thinking, 'The real me has a more dangerous side to me... but will you still love me?' It's not just the cuteness on the outside, but the alien-like, slightly grotesque side that I hide inside is also part of my true self. I want to express that in the music video. So in the end, I want to include a shocking scene where something like a tentacle breaks through the skin of the cute girl. It's a pop and cute worldview, but it's also a bit grotesque, so that girls can sympathize with it and boys can find it a bit funny. I want to make a pop song like that...! Anyway, it's not just sweet and cute, it's got a poisonous feel to it. Do you understand?"
...Reading it again now, it's incoherent, and I'm just listing all the things I want to say (laughs)
It's really "loaded with stuff". However, ChatGPT accurately grasped my intentions, showed empathy, and returned creative suggestions even from my rambling talk. This sense of security that "everything is accepted" drew out even more of my ideas.
3-2. Three steps of songwriting - Collaboration with ChatGPT
The songwriting with ChatGPT was roughly divided into three steps.
- Step 1: Chat-based brainstorming - Just throwing out your ideas
As mentioned above, first of all, just talk to ChatGPT about the images, keywords, fragmented stories, etc. that come to mind using voice input. At this stage, there is no need to worry about logic or completeness. The idea is to sow the seeds of ideas freely, such as "I wonder if it's like this?" or "I want to include that too." ChatGPT helps organize this information and narrow down the theme and direction.
- Step 2: From direct to indirect expressions - ways to avoid being "lame"
If you express the message or emotion you want to convey in direct words, it will inevitably sound explanatory or preachy, and the appeal of the lyrics will be halved. This tends to result in so-called "lame" lyrics. For example, instead of saying "I'm actually a monster" directly, using metaphors and scene descriptions that suggest this will give the lyrics depth and dimension.
In this step, we polished the words by consulting ChatGPT, asking questions such as "Can't this expression be made more poetic?" and "I want to express this emotion not directly, but by comparing it to something else." For example, we avoided the direct word "monster" and replaced it with words that evoke more images, such as "alien heart," "monster shadow," and "neon claw."
- Step 3: Use of rhymes and repetitions - Pursuing catchiness and addictiveness
A sense of rhythm that is easy to sing and catchy phrases that stick in your mind are very important in determining the impression of a song. For this reason, I made an effort to rhyme (rhyming) and repeat memorable phrases (refrain) throughout the lyrics.
ChatGPT is also very good at suggesting words that rhyme and creating phrases that are suitable for refrains. If you ask, "Are there any words that rhyme with this phrase?" or "I want to make the chorus more addictive, what kind of repetition would be good?", it will give you accurate advice. The phrase "Suki Suki Love Love / Getchu Getchu" mentioned at the beginning was also polished in this very step and became the face of the song.
What I particularly valued in the songwriting process was the point of "avoiding direct expression" in Step 2. While firmly holding the message you want to convey, how to wrap it in artistic words that stimulate the listener's imagination. I believe that this is the key to creating lyrics that strike a deep chord in the hearts of listeners and encourage empathy. Through dialogue with ChatGPT, we repeatedly worked on this task of "increasing the level of abstraction of words."
3-3. Lyrics for "Kawaii Monster" - The thoughts behind these words
After countless conversations and trial-and-error with ChatGPT, the following lyrics were created (excerpt below):
Getchu, Getchu, suki suki
The real me — know you can’t see
(Getchu, Getchu, love love — The real me, you can’t see it, can you?)
The phrase “the real me” hints at a hidden true nature.
Every time you say, ‘You’re so cute,’ deep inside
My alien heart goes tick tick tick
Though flattered on the outside, inside there's a strange “alien heart” beating like a time bomb — unstable and unpredictable.
I tilt my head, playing coy
Think you’ve caught me? Nope — you’re the pet.
Flipping the script — while they think they’re in control, it’s actually she who's taming them like a pet.
A monster’s shadow under my skin
But darlin’, wanna touch it? Come in.
A monster lurks beneath the surface. It’s a daring invitation, fully aware of the danger.
These lyrics are sprinkled with elements that make up the concept of “Kawaii Monster”, such as “cute but not to be careless”, “hidden true nature behind sweet words”, and “devilish charm that captivates the other person”. I believe that the lyrics are filled with word combinations and expressions that I could never have come up with on my own, which were born only through collaboration with ChatGPT.
4. Composition x Suno - A magical tool that turns anyone into a composer

Once the lyrics are complete, the next step is to create the music. For me, who has no musical experience, composing music was one of the most difficult steps. However, here too, generative AI came to my rescue. This time, I used "Suno AI", an AI service that can generate high-quality music with amazing ease.
4-1. Reasons for selecting Suno AI
I tried several other music generation AIs, but I was impressed by Suno AI's particular excellence in generating modern pop music such as J-POP and K-POP. Another big attraction was that if you enter lyrics, it will even generate vocals to match those lyrics.
4-2. Composing process with Suno AI - The magic of prompt engineering
Using Suno AI is very simple.
First, access the Suno AI website and select composition mode. There are two main input fields.
- Lyrics: Paste the lyrics of "Kawaii Monster" that we created in cooperation with ChatGPT here.
- Style of Music / Music Style: This is the core of Suno AI, where you describe the genre, atmosphere, instruments used, tempo, etc. of the song you want to generate in English text prompts. The quality of this prompt greatly affects the quality of the generated song.
How to write the prompt for "Style of Music". This is also ChatGPT's specialty. I consulted ChatGPT as follows.
"I'm thinking of making a song with Suno AI, and I want you to come up with a melody that matches the lyrics of "Kawaii Monster". I'm imagining a catchy electropop that a K-POP idol might sing. I also want a slightly dark and mysterious atmosphere, but the chorus should be addictive and catchy. Can you suggest some good English prompts?"
ChatGPT then presented several prompt suggestions. From among them, I chose the one that was closest to my image as the base, and further tweaked it to enter it into Suno AI. For example, here is a prompt:
"Energetic K-Pop, dark electro pop, catchy chorus, female vocal, synth-heavy, driving bassline, mysterious vibe, BPM 128, kawaii but edgy"
This prompt is packed with specific elements that are desired in a song, such as "energetic K-Pop," "dark electro pop," "catchy chorus," "female vocals," "heavy use of synthesizers," "driving bass line," "mysterious atmosphere," "BPM 128," and "cute but edgy."
When you enter the prompt and lyrics and press the generate button, Suno AI will suggest two versions of a song (each about two minutes long) within a few tens of seconds to a few minutes. I was so moved when I listened to the generated song. "It seems possible! It's cute! It's amazing!" I was overwhelmed with emotion.
Of course, it's not always possible to create a perfect song in one go. If the generated song is a little different from what you imagined, you can adjust the prompts (for example, "Make it more up-tempo" or "More vocals to the forefront") or use Suno AI's "Continue from this song" feature to generate further variations based on the parts you like. This trial-and-error process is also one of the joys of co-creating with AI.
Part 5. Character design × Midjourney - Visualizing the world of the lyrics
Once the song and lyrics are ready, the next step is to design the character that will be the face of the music video. In order to embody the world of "Kawaii Monster" depicted in the lyrics as a visually attractive character, we used the image generation AI "Midjourney".
5-1. Collaboration with ChatGPT - Prompt generation partnership
Midjourney is an AI that can generate very high-quality and original images based on text prompts, but to maximize its potential, the technology to create detailed and accurate prompts (prompt engineering) is required. However, coming up with a good prompt from scratch is quite a difficult task.
So again, we asked ChatGPT for help. I asked ChatGPT to "use Midjourney to generate an attractive girl character image that matches the world of the lyrics of 'Kawaii Monster'. Please come up with a prompt for that purpose." At that time, I specifically told them the characteristics and atmosphere I wanted the character to have, the image of the costume, the tone of the entire MV, etc.
- Song concept: "Cute × Grotesque", "Reversal of roles", "Sophisticated fashionability like a Korean music video"
- Character image: "An idol-like girl who is cute at first glance, but somehow mysterious and devilish", "A dual nature that combines sweetness and poison"
- Visual keywords: "Pastel colors", "Neon", "Sparkly", "But somehow dark"
Based on this information, ChatGPT generated several prompt ideas. Using these as a starting point, I further refined the final prompt while adding my own preferences.
5-2. Key visual prompt example for "Kawaii Monster"
This is one of the prompts I entered into Midjourney to generate a girl character for the MV’s key visual.
Full-body Korean-Japanese female idol, pastel purple & pink gradient twin space buns with pearl hairpins, straight bangs, cream-pink fluffy faux fur jacket, iridescent pleated mini skirt, chunky pink glitter platform boots with fluffy white socks, soft glam makeup, glossy rose lips, silver layered jewelry, hyper-realistic 8K, high-fashion MV style, studio lighting, sharp focus, intricate details

This prompt specifies the following elements in detail:
- full-body: A full-body image.
- Korean-Japanese female idol: A woman with both Korean and Japanese idol vibes. The intention was for the image to have a global appeal without being too biased towards a specific nationality.
- hyper-realistic 8K: Ultra-high resolution and realistic texture.
- studio lighting, sharp focus, intricate details: Studio lighting, sharp focus, intricate details.
When you enter this prompt, Midjourney will generate four candidate images. From these, Midjourney will select the one that is closest to your image, and then further increase the resolution and adjust the details (using the variation generation function, etc.) to pursue the ideal character visual. Sometimes, the prompt words will be slightly changed, or elements will be added or removed, and the generation will be repeated many times. This trial-and-error process is the key to creating the ideal visual.
5-3. Visual development of various scenes
In a music video, the same character needs to show various expressions and situations. For that reason, in addition to the key visual, we also created multiple images with different atmospheres to match the development of the lyrics and the changes in the melody.
Example 1:

half-body front shot of the idol placing one hand over her chest with closed eyes, a translucent neon alien heart floating in front of her body, softly glowing in sync with tick tick tick motion, pastel twin buns, pearl details, soft glam makeup, background fades to soft white void with pink glitch pulses, surreal sci-fi elegance, 8K hyper-realistic
Example 2:

cinematic over-the-shoulder shot of the idol looking into a mirror, but her reflection shows glowing monster eyes, clawed hands, and alien scales creeping from her neck, pastel purple-pink twin space buns, cream-pink fluffy faux fur jacket, silver chains, surreal white bathroom with cracked lighting, faint heartbeat pulsing in reflections, 8K MV realism
Example 3:

mid-shot of the idol whispering with both hands near her mouth, soft pastel fog behind her begins to form a monstrous silhouette mimicking her posture, Pastel purple-pink hair in twin buns, cream-pink fluffy jacket, glinting silver chains, emphasized duality theme, soft dreamlike glow, high-fashion surrealism in 8K
By finely adjusting the prompts to match the lyrics and the emotions I wanted to convey, and using Midjourney to generate a variety of visuals, I was able to broaden the range of expression throughout the MV. Midjourney is like a magic paintbrush that transforms the vague images in my head into amazingly beautiful visual art.
Part 6. Video generation - AI animation that brings still images to life
Once the music, lyrics, and character visuals are ready, it's time to combine them to create a music video. To bring the beautiful still images generated by Midjourney to life and elevate them into moving videos, we strategically used multiple video generation AI tools.
6-1. Strategy for using different video generation AI tools - Selecting the right tool for the right job
Currently, the field of video generation AI is rapidly evolving, and tools with various characteristics are appearing. This time, we mainly used the following three AIs to take advantage of the strengths of each tool and create a more expressive MV.
◆ HeyGen:
An AI with a reputation for the accuracy of lip syncing.
Uploading the generated character still images and the singing audio generated by Suno AI to HeyGen will generate a video in which the character's mouth moves naturally to match the lyrics. In particular, the synchronization between facial expressions and mouth movements is high, allowing for realistic expressions as if the character is really singing. Since singing scenes are a very important element in a music video, we paid particular attention to the quality of the lip sync.
When using HeyGen, you prepare an audio file and an image that clearly shows the face of the character whose mouth you want to move. The operation is relatively simple; just upload the file and configure a few settings to generate a lip sync video. However, to achieve perfect synchronization, the quality of the original image and the clarity of the audio also have an impact.
◆ Vidu:

AI with powerful Image to Video and Reference functions.
Vidu excels at generating short video clips (a few seconds) based on a single still image generated by Midjourney. For example, you can add natural movements such as characters blinking slowly, hair blowing in the wind, and backgrounds changing subtly. You can also load a specific video or image style as a "reference" to make the atmosphere and texture of the movement of the generated video closer to that reference. This time, we used it to add movement to key visuals and representative still images of each scene to enhance the dynamism of the entire MV.
◆ Higgsfield:
AI specialized in template-based effect video generation.
Higgsfield allows you to easily create short video clips with stylish and distinctive effects (such as sparkling particles, neon-like lines, and psychedelic color changes) that look like they were made by professionals, simply by inserting your own images or videos into a number of pre-prepared video templates. By inserting clips generated by Higgsfield as accents in scenes you want to make a particularly strong impression on or in parts that tend to be monotonous, you can create visual interest and a sense of tempo.
For example, it was useful for adding rhythmic effects to character images at the climax of the chorus and generating abstract art-like videos in the interlude.
Rather than using these AI tools alone, understanding the characteristics of each tool and using them appropriately for different purposes will enable more complex and attractive video expression. For example, you can first generate high-quality character still images with Midjourney, then add basic movements to those still images with Vidu, then apply lip sync with HeyGen, and finally combine them with effect videos generated by Higgsfield as accents.
6-2. Editing and Final Adjustments × Filmora - Turning Video into a Story
Once the video clip fragments generated by each AI tool are collected, they need to be edited together into a single music video. The video editing software "Filmora (Wondershare)" was used in this final process.
Filmora features an intuitive and easy-to-understand interface, making it relatively easy to use even for beginners in video editing. At the same time, it has all the functions necessary for full-scale video editing, such as cutting, adding transitions (scene change effects), inserting subtitles and titles, color correction, applying effects, and adjusting background music and sound effects.
Specific editing tasks included the following:
- Clip placement and length adjustment: Video clips for each scene were placed on the timeline by the development of the music generated by Suno AI (A melody, B melody, chorus, etc.), and the length of each was adjusted.
- Cutting and transitions: Unnecessary parts of each clip were cut and transition effects were applied to make the scene change look smooth.
- Effects: Add Filmora's built-in effects to highlight certain scenes.
- Final export: Export the completed MV in a format and resolution appropriate for the platform.
This editing process is a highly creative process that breathes the final "soul" into the AI-generated materials. Each decision, such as which clips to show in what order and what effects to use to express emotions, greatly affects the overall impression of the MV. What AI provides us is only the "materials," and how to cook them and sublimate them into a "work" ultimately depends on human sensibility and editing skills.
I was tone-deaf and had no musical experience, but with the help of a powerful partner called AI, I was able to complete each of the creative tasks that I had previously thought were beyond my dreams, such as writing lyrics, composing music, designing characters, and producing videos, and finally completed a music video.