1 code implementation • 4 Apr 2024 • Walid Bousselham, Angie Boggust, Sofian Chaybouti, Hendrik Strobelt, Hilde Kuehne
Vision Transformers (ViTs), with their ability to model long-range dependencies through self-attention mechanisms, have become a standard architecture in computer vision.
1 code implementation • 1 Dec 2023 • Walid Bousselham, Felix Petersen, Vittorio Ferrari, Hilde Kuehne
To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery to a self-self attention path.
1 code implementation • CVPR 2023 • Aisha Urooj Khan, Hilde Kuehne, Bo Wu, Kim Chheu, Walid Bousselham, Chuang Gan, Niels Lobo, Mubarak Shah
The proposed method is trained in an end-to-end manner and optimized by a VQA loss with the cross-entropy function and a Hungarian matching loss for the situation graph prediction.
Ranked #6 on Video Question Answering on AGQA 2.0 balanced (Average Accuracy metric)
1 code implementation • arXiv 2021 • Walid Bousselham, Guillaume Thibault, Lucas Pagano, Archana Machireddy, Joe Gray, Young Hwan Chang, Xubo Song
Ensemble of predictions is known to perform better than individual predictions taken separately.
Ranked #5 on Semantic Segmentation on COCO-Stuff test