Date

7 hours ago

Organization

Paper URL

2507.19634

License

CC BY 4.0

Tags

Multimodal

Natural Language Processing

Language

MCIF is a multilingual, multimodal, manually annotated evaluation dataset based on scientific speeches, released in 2025 by Fondazione Bruno Kessler in collaboration with the Karlsruhe Institute of Technology and Translated. Related papers include... MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific TalksThe aim is to evaluate the ability of multimodal large language models to understand and execute instructions in cross-language scenarios, as well as their ability to integrate speech, visual and text information for reasoning.

This dataset contains 100 scientific speech samples, covering approximately 10 hours of video content. It provides three modalities for input: text, speech, and video, covering four languages: English, German, Italian, and Chinese, and includes both long and short input formats. Among them, 21 core speech samples provide complete, high-quality human transcriptions of English, with a text size of approximately 15,500 words; the remaining samples are primarily used for summarizing tasks, each accompanied by a corresponding paper abstract and aligned audio and video content. The dataset covers instruction tasks such as recognition, translation, question answering, and summarizing, all driven by natural language instructions, and is used to evaluate the cross-language instruction understanding capabilities of multimodal models.

Related Datasets

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

Discuss on Discord

Date

7 hours ago

Organization

Paper URL

2507.19634

License

CC BY 4.0

Related Datasets

LAMMPS-Bench Molecular Dynamics Benchmark Dataset

3 months ago

Care-PD Parkinson's 3D Gait Assessment Dataset

a month ago

HumanSense Benchmark Dataset

2 months ago

Ditto-1M instruction-driven Video Editing Dataset

3 months ago

NAMD_Benchmark Molecular Dynamics Performance Benchmark Dataset

2 months ago

Paper2Video Paper Video Benchmark Dataset

3 months ago

132.74 MB84

FrontierScience Inference Research Task Evaluation Dataset

25 days ago

PhysDriver Physiological Test Dataset

a month ago

VenusBench-GD Cross-Platform Interface Understanding Dataset

13 days ago

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

MCIF Multimodal Cross-Language Instruction Following Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

MCIF Multimodal Cross-Language Instruction Following Dataset

Related Datasets

LAMMPS-Bench Molecular Dynamics Benchmark Dataset

Care-PD Parkinson's 3D Gait Assessment Dataset

HumanSense Benchmark Dataset

Ditto-1M instruction-driven Video Editing Dataset

NAMD_Benchmark Molecular Dynamics Performance Benchmark Dataset

Paper2Video Paper Video Benchmark Dataset

FrontierScience Inference Research Task Evaluation Dataset

PhysDriver Physiological Test Dataset

VenusBench-GD Cross-Platform Interface Understanding Dataset

Build AI with AI

HyperAI Newsletters

Command Palette

MCIF Multimodal Cross-Language Instruction Following Dataset

Related Datasets

LAMMPS-Bench Molecular Dynamics Benchmark Dataset

Care-PD Parkinson's 3D Gait Assessment Dataset

HumanSense Benchmark Dataset

Ditto-1M instruction-driven Video Editing Dataset

NAMD_Benchmark Molecular Dynamics Performance Benchmark Dataset

Paper2Video Paper Video Benchmark Dataset

FrontierScience Inference Research Task Evaluation Dataset

PhysDriver Physiological Test Dataset

VenusBench-GD Cross-Platform Interface Understanding Dataset

Build AI with AI

HyperAI Newsletters

Related Datasets

LAMMPS-Bench Molecular Dynamics Benchmark Dataset

Care-PD Parkinson's 3D Gait Assessment Dataset

HumanSense Benchmark Dataset

Ditto-1M instruction-driven Video Editing Dataset

NAMD_Benchmark Molecular Dynamics Performance Benchmark Dataset

Paper2Video Paper Video Benchmark Dataset

FrontierScience Inference Research Task Evaluation Dataset

PhysDriver Physiological Test Dataset

VenusBench-GD Cross-Platform Interface Understanding Dataset

Related Datasets

LAMMPS-Bench Molecular Dynamics Benchmark Dataset

Care-PD Parkinson's 3D Gait Assessment Dataset

HumanSense Benchmark Dataset

Ditto-1M instruction-driven Video Editing Dataset

NAMD_Benchmark Molecular Dynamics Performance Benchmark Dataset

Paper2Video Paper Video Benchmark Dataset

FrontierScience Inference Research Task Evaluation Dataset

PhysDriver Physiological Test Dataset

VenusBench-GD Cross-Platform Interface Understanding Dataset