no code implementations • 21 Mar 2024 • Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi
To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos.
no code implementations • 12 Oct 2023 • Shashanka Venkataramanan, Mamshad Nayeem Rizve, João Carreira, Yuki M. Asano, Yannis Avrithis
But are we making the best use of data?
1 code implementation • ICCV 2023 • Sarinda Samarasinghe, Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
To address this issue, in this work, we propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning to balance the information from the source and target domains.
cross-domain few-shot learning Few-Shot action recognition +3
1 code implementation • ICCV 2023 • Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah
Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings in a joint multi-modal representation space without relying on human annotations.
1 code implementation • CVPR 2023 • Ishan Rajendrakumar Dave, Mamshad Nayeem Rizve, Chen Chen, Mubarak Shah
We observe that these representations complement each other depending on the nature of the action.
no code implementations • CVPR 2023 • Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen
To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.
Weakly Supervised Action Localization Weakly Supervised Temporal Action Localization
1 code implementation • ICCV 2023 • Sabbir Ahmed, Abdullah Al Arafat, Mamshad Nayeem Rizve, Rahim Hossain, Zhishan Guo, Adnan Siraj Rakin
Source-free domain adaptation (SFDA) is a popular unsupervised domain adaptation method where a pre-trained model from a source domain is adapted to a target domain without accessing any source data.
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Mubarak Shah
We also highlight the flexibility of our approach in solving novel class discovery task, demonstrate its stability in dealing with imbalanced data, and complement our approach with a technique to estimate the number of novel classes
Ranked #1 on Open-World Semi-Supervised Learning on CIFAR-100
Novel Class Discovery Open-World Semi-Supervised Learning +1
1 code implementation • 5 Jul 2022 • Mamshad Nayeem Rizve, Navid Kardan, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data.
Ranked #1 on Open-World Semi-Supervised Learning on CIFAR-10
1 code implementation • CVPR 2022 • Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, Mubarak Shah
To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data.
1 code implementation • CVPR 2021 • Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah
Equivariance or invariance has been employed standalone in the previous works; however, to the best of our knowledge, they have not been used jointly.
1 code implementation • 20 Jan 2021 • Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah
However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.
Ranked #9 on Self-supervised Video Retrieval on UCF101
2 code implementations • ICLR 2021 • Mamshad Nayeem Rizve, Kevin Duarte, Yogesh S Rawat, Mubarak Shah
The recent research in semi-supervised learning (SSL) is mostly dominated by consistency regularization based methods which achieve strong performance.
no code implementations • 23 Apr 2020 • Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah
For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets.