MUSIA

Overview

Track Overview Registration Important dates Organizers Results Datasets and Baseline Model

Overview

This track focuses on generating visual illustrations for stories written in multiple languages, including English, and Hindi. This task has valuable applications across various domains, such as:

Education: Visual storytelling can be used to illustrate concepts in children's books or language learning materials, making content more engaging and easier to understand through visual cues.
Entertainment Industry: Narrative-driven media such as comic books and animated series can benefit from automated illustration pipelines, accelerating content production while enhancing creativity.

One current limitation in this domain is the use of a finite set of characters, which often restricts the richness and diversity found in culturally rooted stories like those in the Panchatantra. On the other hand, using an unlimited number of characters may lead to a lack of consistency, making it harder for readers to form a connection with the narrative.

To address this, we have collected dataset from publicly available stories and illustrations (with proper attribution), which don't have to aim fixed number of characters.

Task

Participants are invited to build AI-powered systems that understand narrative elements from multilingual stories and generate illustrations that reflect key moments in the plot. The aim is to enhance storytelling through coherent and culturally relevant visuals.

Narrative Understanding & Illustration Generation: Systems should comprehend multilingual stories and generate visual representations of important plot events.
Cultural & Contextual Relevance: Illustrations must be appropriate for the story’s cultural and contextual background.
Submission Requirements: Submissions must include a series of images that visually complement the narrative flow.
Task Details: Each story will come with a specified number of images to generate. Participants must segment the story and produce contextually relevant visuals based on paragraphs.
Evaluation Criteria:
- Expert Review: Judges will evaluate images based on narrative relevance, visual quality, and consistency.
- Automated Evaluation: DreamSim will be used to compare generated images against ground truth visuals.

For more information on the datasets used for training and testing, please refer to the Dataset section.