Search Results for author: Yingxuan Li

Found 3 papers, 2 papers with code

Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion

no code implementations22 Apr 2024 Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui

To address this problem, we propose an iterative multimodal framework, the first to employ multimodal information for both character identification and speaker prediction tasks.

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

1 code implementation9 Nov 2023 Licheng Wen, Xuemeng Yang, Daocheng Fu, XiaoFeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving.

Autonomous Driving Common Sense Reasoning +4

Manga109Dialog: A Large-scale Dialogue Dataset for Comics Speaker Detection

2 code implementations30 Jun 2023 Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui

For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words.

Graph Generation Scene Graph Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.