A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Downloads (Pure)

Abstract

This study aims to test and evaluate the capabilities and characteristics of current mainstream Visual Language Models (VLMs) in generating critiques for traditional Chinese painting. To achieve this, we first developed a quantitative framework for Chinese painting critique. This framework was constructed by extracting multi-dimensional evaluative features covering evaluative stance, feature focus, and commentary quality from human expert critiques using a zero-shot classification model. Based on these features, several representative critic personas were defined and quantified. This framework was then employed to evaluate selected VLMs such as Llama, Qwen, or Gemini. The experimental design involved persona-guided prompting to assess the VLM’s ability to generate critiques from diverse perspectives. Our findings reveal the current performance levels, strengths, and areas for improvement of VLMs in the domain of art critique, offering insights into their potential and limitations in complex semantic understanding and content generation tasks.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationEMNLP 2025
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Place of PublicationChina
PublisherAssociation for Computational Linguistics
Pages1945-1971
Number of pages27
ISBN (Print)9798891763357
DOIs
Publication statusPublished - Nov 2025
EventThe 2025 Conference on Empirical Methods in Natural Language Processing - Suzhou International Expo Centre (SuzhouExpo), Suzhou, China
Duration: 4 Nov 20259 Nov 2025
https://2025.emnlp.org/

Conference

ConferenceThe 2025 Conference on Empirical Methods in Natural Language Processing
Abbreviated titleEMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25
Internet address

Fingerprint

Dive into the research topics of 'A Structured Framework for Evaluating and Enhancing Interpretive Capabilities of Multimodal LLMs in Culturally Situated Tasks'. Together they form a unique fingerprint.

Cite this