no code implementations • 1 Apr 2024 • Akshita Gupta, Gaurav Mittal, Ahmed Magooda, Ye Yu, Graham W. Taylor, Mei Chen
Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video.
1 code implementation • 24 Jul 2023 • Christopher Clarke, Matthew Hall, Gaurav Mittal, Ye Yu, Sandra Sajeev, Jason Mars, Mei Chen
In this paper, we present Rule By Example (RBE): a novel exemplar-based contrastive learning approach for learning from logical rules for the task of textual content moderation.
no code implementations • 17 May 2023 • Jialin Yuan, Ye Yu, Gaurav Mittal, Matthew Hall, Sandra Sajeev, Mei Chen
There is a rapidly growing need for multimodal content moderation (CM) as more and more content on social media is multimodal in nature.
no code implementations • CVPR 2023 • Mamshad Nayeem Rizve, Gaurav Mittal, Ye Yu, Matthew Hall, Sandra Sajeev, Mubarak Shah, Mei Chen
To address this, we present PivoTAL, Prior-driven Supervision for Weakly-supervised Temporal Action Localization, to approach WTAL from a localization-by-localization perspective by learning to localize the action snippets directly.
Weakly Supervised Action Localization Weakly Supervised Temporal Action Localization
no code implementations • CVPR 2023 • Lan Wang, Gaurav Mittal, Sandra Sajeev, Ye Yu, Matthew Hall, Vishnu Naresh Boddeti, Mei Chen
We present ProTeGe as the first method to perform VTG-based untrimmed pretraining to bridge the gap between trimmed pretrained backbones and downstream VTG tasks.
no code implementations • 1 Aug 2022 • Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen
It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries.
Ranked #1 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)
no code implementations • CVPR 2022 • Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, Mei Chen
We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction.
Ranked #1 on Online Action Detection on TVSeries
no code implementations • 25 Oct 2021 • Yu Gong, Ye Yu, Gaurav Mittal, Greg Mori, Mei Chen
Importantly, we argue and empirically demonstrate that MUSE, compared to other feature discrepancy functions, is a more functional proxy to introduce dependency and effectively improve the expressivity of all features in the knowledge distillation framework.
no code implementations • ICCV 2021 • Jay Patravali, Gaurav Mittal, Ye Yu, Fuxin Li, Mei Chen
We present MetaUVFS as the first Unsupervised Meta-learning algorithm for Video Few-Shot action recognition.
1 code implementation • ACCV 2020 • Jedrzej Kozerawski, Victor Fragoso, Nikolaos Karianakis, Gaurav Mittal, Matthew Turk, Mei Chen
Unfortunately, this imbalance enables a visual recognition system to perform well on head classes but poorly on tail classes.
Ranked #53 on Long-tail Learning on ImageNet-LT
1 code implementation • 16 Jul 2020 • Chaitanya Devaguptapu, Devansh Agarwal, Gaurav Mittal, Pulkit Gopalani, Vineeth N Balasubramanian
We show that NAS, which is popular for achieving SoTA accuracy, can provide adversarial accuracy as a free add-on without any form of adversarial training.
no code implementations • CVPR 2020 • Gaurav Mittal, Chang Liu, Nikolaos Karianakis, Victor Fragoso, Mei Chen, Yun Fu
To reduce HPO time, we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a task-aware method to warm-start HPO for deep neural networks.
no code implementations • 2 Oct 2019 • Gaurav Mittal, Baoyuan Wang
All previous methods for audio-driven talking head generation assume the input audio to be clean with a neutral tone.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Gaurav Mittal, Shubham Agrawal, Anuva Agarwal, Sushant Mehta, Tanya Marwah
We propose a method to generate an image incrementally based on a sequence of graphs of scene descriptions (scene-graphs).
1 code implementation • ICCV 2017 • Tanya Marwah, Gaurav Mittal, Vineeth N. Balasubramanian
This paper proposes a network architecture to perform variable length semantic video generation using captions.
1 code implementation • 30 Nov 2016 • Gaurav Mittal, Tanya Marwah, Vineeth N. Balasubramanian
This paper introduces a novel approach for generating videos called Synchronized Deep Recurrent Attentive Writer (Sync-DRAW).