no code implementations • 25 Mar 2024 • Binay Kumar Singh, Niels da Vitoria Lobo
In co-occurrence matrix analysis, we set base classes based on the maximum occurrences of the labels and build association rules and generate frequent patterns.
1 code implementation • 5 Jul 2022 • Aisha Urooj Khan, Hilde Kuehne, Chuang Gan, Niels da Vitoria Lobo, Mubarak Shah
Transformers for visual-language representation learning have been getting a lot of interest and shown tremendous performance on visual question answering (VQA) and grounding.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Aisha Urooj Khan, Amir Mazaheri, Niels da Vitoria Lobo, Mubarak Shah
We present MMFT-BERT(MultiModal Fusion Transformer with BERT encodings), to solve Visual Question Answering (VQA) ensuring individual and combined processing of multiple input modalities.
no code implementations • 8 May 2020 • Aidean Sharghi, Niels da Vitoria Lobo, Mubarak Shah
We Input a set of video shots and the network generates a text description for each shot.
no code implementations • 4 Jan 2015 • Mahdi M. Kalayeh, Stephen Mussmann, Alla Petrakova, Niels da Vitoria Lobo, Mubarak Shah
In the second phase, via a Kmeans clustering approach, we create motion components by clustering the flow vectors with respect to their location and velocity.