1 code implementation • 23 Jan 2023 • Adrià Recasens, Jason Lin, Joāo Carreira, Drew Jaegle, Luyu Wang, Jean-Baptiste Alayrac, Pauline Luc, Antoine Miech, Lucas Smaira, Ross Hemsley, Andrew Zisserman
Attention-based models are appealing for multimodal processing because inputs from multiple modalities can be concatenated and fed to a single backbone network - thus requiring very little fusion engineering.