HyperAIHyperAI

Command Palette

Search for a command to run...

MCIF Multimodal Cross-Language Instruction Following Dataset

Discuss on Discord

Date

7 hours ago

Organization

Fondazione Bruno Kessler
KIT
Translated

Paper URL

2507.19634

License

CC BY 4.0

MCIF is a multilingual, multimodal, manually annotated evaluation dataset based on scientific speeches, released in 2025 by Fondazione Bruno Kessler in collaboration with the Karlsruhe Institute of Technology and Translated. Related papers include... MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific TalksThe aim is to evaluate the ability of multimodal large language models to understand and execute instructions in cross-language scenarios, as well as their ability to integrate speech, visual and text information for reasoning.

This dataset contains 100 scientific speech samples, covering approximately 10 hours of video content. It provides three modalities for input: text, speech, and video, covering four languages: English, German, Italian, and Chinese, and includes both long and short input formats. Among them, 21 core speech samples provide complete, high-quality human transcriptions of English, with a text size of approximately 15,500 words; the remaining samples are primarily used for summarizing tasks, each accompanied by a corresponding paper abstract and aligned audio and video content. The dataset covers instruction tasks such as recognition, translation, question answering, and summarizing, all driven by natural language instructions, and is used to evaluate the cross-language instruction understanding capabilities of multimodal models.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp