Command Palette
Search for a command to run...
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning
Gagan Mundada Yash Vishe Amit Namburi Xin Xu Zachary Novack Julian McAuley Junda Wu
Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have demonstratedimpressive capabilities across various vision-language tasks. However, theirreasoning abilities in the multimodal symbolic music domain remain largelyunexplored. We introduce WildScore, the first in-the-wild multimodal symbolicmusic reasoning and analysis benchmark, designed to evaluate MLLMs' capacity tointerpret real-world music scores and answer complex musicological queries.Each instance in WildScore is sourced from genuine musical compositions andaccompanied by authentic user-generated questions and discussions, capturingthe intricacies of practical music analysis. To facilitate systematicevaluation, we propose a systematic taxonomy, comprising both high-level andfine-grained musicological ontologies. Furthermore, we frame complex musicreasoning as multiple-choice question answering, enabling controlled andscalable assessment of MLLMs' symbolic music understanding. Empiricalbenchmarking of state-of-the-art MLLMs on WildScore reveals intriguing patternsin their visual-symbolic reasoning, uncovering both promising directions andpersistent challenges for MLLMs in symbolic music reasoning and analysis. Werelease the dataset and code.