Computer Vision

Vietnamese Multimodal Learning

3 papers with code • 0 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Vietnamese Multimodal Learning

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

hieunghia-pat/openvivqa-dataset • 7 May 2023

The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers.

Paper
Code

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

minhquan6203/vitextvqa-dataset • 16 Apr 2024

Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images.

Paper
Code

New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis

no code yet • 1 May 2024

To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4, 876 text-image pairs with 14, 618 fine-grained annotations for both text and image in the hotel domain.

Paper
Add Code