Narratron is an interactive projector that augments hand shadow puppetry with AI-generated storytelling. Designed for all ages, it transforms traditional physical shadow plays into an immersive and phygital storytelling experience. Hand shadow puppetry has been practiced as one of the oldest forms of storytelling in a transcultural context. To enhance that experience with multimodal artificial intelligence, Narratron allows users to interact with hand shadows with AI-generated auditory and visual outputs of the story their hand shadows are telling.
The participation of large language model in shadow play storytelling process can also push the boundary of human-AI interaction by expanding the story co-creation experience to multi-sensory immersion. To achieve this, Narratron integrates three major systems: 1. hand gesture recognition through custom-trained image classifier to create animal character, 2. 5-sentence short children story generated by pre-trained GPT-3.5 model , 3. visual and audio outputs related to the story through diffusion model and speech synthesizer. To allow body engagement and provide physical affordance of "making stories" for players, Narratron was designed as a toy-like device with a film projector knob and a camera shutter, instead of relying on web or mobile devices.
By placing animated hand forms in front of the on-device camera and being captured as shadow puppets, the embedded algorithms that utilize multiple AI models recognise the captured photo as a set of main characters of the story, create the story, generate the visual settings of the story and produce the narrator speech. Users are encouraged to develop the story and change the plot at any point by simply posing a different shadow as a new character, and the Human-AI interaction, at this moment, transcends the responder-respondee dichotomy as a tangible form of co-creation.
The history of hand shadow play is nearly untraceable which was prevalently practiced long before the existence of Greek shadow show Karagiozis or Chinese shadow puppetry Pi Ying Xi. It is a prelinguistic and transcultural form of storytelling that entertains and educates the younger generation; it is also a stimuli of creative production, by mimicking the things we see, and by telling the stories we relate. Narratron, in that sense, has deeply embedded AI into this intelligent collective effort of hands, eyes, and brains as a true "fairytale copilot". Its nature of multimodality that combines visual, auditory, tactile, and textual I/O, supported by the collaboration of LLM, image classifier, speech synthesizer and diffusion models, demonstrates how seamlessly we are able to make bodily interactions with AI. Through bridging the digital and the physical, we are now connecting the ancient and the future.
Narratron is designed to offer users a captivating and immersive experience that merges the art of hand shadow puppetry with cutting-edge AI technology. The user's experience with Narratron begins with the startup screen, which creates a serene and focused ambiance, setting the stage for an immersive journey. As the user turns on Narratron, they are greeted by the instructions, preparing them for the experience that lies ahead. Once the startup screen fades away, the user is free to explore and play with their hand shadows in any way they wish. They can experiment with different shapes, sizes, and movements while Narratron's camera captures the intricate hand shadow shapes created by the user. The captured hand shadow shapes are then analyzed by trained image classifiers integrated into Narratron. These algorithms interpret the hand shadows and translate them into animal keywords. The keywords serve as the foundation for the next step of the process: generating a complete story.
To generate the story, Narratron employs the GPT-3.5 language model. The animal keyword identified from the user's hand shadow is processed by GPT-3.5 and generates a story seamlessly combining plotlines, dialogues, and descriptive elements. While the story is being generated, Narratron simultaneously generates a corresponding image using Stable Diffusion that represents the animal associated with the user's hand shadow. This image is then projected onto the surface, adding a visual component to the audio experience, enhancing user's connection to the narrative.
To initiate the storytelling experience and progress to the next chapter, the user interacts with Narratron by spinning the knob, reminiscent of vintage movie projectors, intended to add a sense of nostalgia and tactile engagement to the overall user experience. Each rotation of the knob signifies a progression, unlocking a new chapter and revealing fresh elements of the narrative. The user's active participation in this process creates a sense of agency, allowing them to dictate the pacing and flow of the storytelling experience.