no code implementations • 22 Apr 2024 • Yingxuan Li, Ryota Hinami, Kiyoharu Aizawa, Yusuke Matsui
To address this problem, we propose an iterative multimodal framework, the first to employ multimodal information for both character identification and speaker prediction tasks.
1 code implementation • 9 Nov 2023 • Licheng Wen, Xuemeng Yang, Daocheng Fu, XiaoFeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao
This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving.
2 code implementations • 30 Jun 2023 • Yingxuan Li, Kiyoharu Aizawa, Yusuke Matsui
For further understanding of comics, an automated approach is needed to link text in comics to characters speaking the words.