2 dataset results for Multimodal Intent Recognition AND Images AND English

PhotoChat, the first dataset that casts light on the photo sharing behavior in online messaging. PhotoChat contains 12k dialogues, each of which is paired with a user photo that is shared during the conversation. Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context.

17 PAPERS • 2 BENCHMARKS

MIntRec

MIntRec is a novel dataset for multimodal intent recognition. It formulates coarse-grained and fine-grained intent taxonomies based on the data collected from the TV series Superstore. The dataset consists of 2,224 high-quality samples with text, video, and audio modalities and has multimodal annotations among twenty intent categories.

5 PAPERS • 1 BENCHMARK

Datasets

2 dataset results for Multimodal Intent Recognition AND Images AND English