Vietnamese Multimodal Learning
3 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in Vietnamese Multimodal Learning
Most implemented papers
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese
The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers.
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images.
New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis
To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4, 876 text-image pairs with 14, 618 fine-grained annotations for both text and image in the hotel domain.