M3Exam is a multilingual, multimodal, and multilevel benchmark designed for evaluating Large Language Models (LLMs). Unlike traditional benchmarks, which often focus on specific tasks or datasets, M3Exam takes a more comprehensive approach by sourcing real and official human exam questions. Let's delve into its unique characteristics:
Multilingualism: M3Exam encompasses questions from multiple countries, requiring strong multilingual proficiency and cultural knowledge. It evaluates how well LLMs handle diverse languages.
Multimodality: Many exam questions are multimodal, combining text with images. M3Exam tests the model's ability to understand and process such complex, multimodal content.
Multilevel Structure: M3Exam features exams from three critical educational periods, allowing a comprehensive assessment of a model's proficiency at different levels.
Here are some key details about M3Exam:
Despite the existence of various benchmarks, M3Exam argues that human exams provide a more suitable means of evaluating general intelligence for large language models. These exams inherently demand a wide range of abilities, including language understanding, domain knowledge, and problem-solving skills.
Top-performing LLMs, including GPT-4, have been assessed on M3Exam. However, they still face challenges with multilingual text, especially in low-resource and non-Latin script languages. Additionally, multimodal LLMs struggle with complex multimodal questions.
(1) [2306.05179] M3Exam: A Multilingual, Multimodal, Multilevel Benchmark .... https://arxiv.org/abs/2306.05179. (2) M3Exam: A Multilingual, Multimodal, Multilevel Benchmark For Evaluating .... https://www.ai-summary.com/m3exam-a-multilingual-multimodal-multilevel-benchmark-for-evaluating-large-language-models/. (3) M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for ... - DeepAI. https://deepai.org/publication/m3exam-a-multilingual-multimodal-multilevel-benchmark-for-examining-large-language-models. (4) M3Exam: A Multilingual , Multimodal , Multilevel ... - GitHub. https://github.com/DAMO-NLP-SG/M3Exam. (5) undefined. https://doi.org/10.48550/arXiv.2306.05179.
Paper | Code | Results | Date | Stars |
---|