no code implementations • 25 Oct 2023 • Xingchen Zhao, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-Pang Chiu, Supun Samarasekera
Recent state-of-the-art (SOTA) UDA methods employ a teacher-student self-training approach, where a teacher model is used to generate pseudo-labels for the new data which in turn guide the training process of the student model.
no code implementations • CVPR 2023 • Nazmul Karim, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-Pang Chiu, Supun Samarasekera, Nazanin Rahnavard
In this regard, source-free domain adaptation (SFDA) excels as access to source data is no longer required during adaptation.
Ranked #3 on Source-Free Domain Adaptation on VisDA-2017
no code implementations • 28 Mar 2023 • Niluthpol Chowdhury Mithun, Kshitij Minhas, Han-Pang Chiu, Taragay Oskiper, Mikhail Sizintsev, Supun Samarasekera, Rakesh Kumar
Precise estimation of global orientation and location is critical to ensure a compelling outdoor Augmented Reality (AR) experience.
no code implementations • 17 May 2022 • Zachary Seymour, Niluthpol Chowdhury Mithun, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
Understanding the geometric relationships between objects in a scene is a core capability in enabling both humans and autonomous agents to navigate in new environments.
1 code implementation • 26 Aug 2021 • Muhammad Zubair Irshad, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments, which requires an autonomous agent to follow natural language instructions in unseen environments.
1 code implementation • 1 Jan 2021 • Junjiao Tian, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Zsolt Kira
Many works have proposed to weigh the standard cross entropy loss function with pre-computed weights based on class statistics such as the number of samples and class margins.
1 code implementation • 12 Sep 2020 • Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images.
no code implementations • 20 Aug 2020 • Sudipta Paul, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury
This task poses a unique challenge as the system is required to perform: (i) retrieval of the relevant video where only a segment of the video corresponds with the queried sentence, and (ii) temporal localization of moment in the relevant video based on sentence query.
no code implementations • IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2019 • Ghazal Mazaheri, Niluthpol Chowdhury Mithun, Jawadul H. Bappy, Amit K. Roy-Chowdhury
In order to exploit these traces in localizing the tampered regions, we propose an encoder-decoder based network where we fuse representations from early layers in the encoder (which are richer in low-level spatial cues, like edges) by skip pooling with representations of the last layer of the decoder and use for manipulation detection.
1 code implementation • CVPR 2019 • Niluthpol Chowdhury Mithun, Sujoy Paul, Amit K. Roy-Chowdhury
The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate.
no code implementations • Proceedings of the 26th ACM international conference on Multimedia·October 2018 2018 • Niluthpol Chowdhury Mithun, Rameswar Panda, Vagelis Papalexakis, Amit K. Roy-Chowdhury
Inspired by the recent success of web-supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation.
no code implementations • 23 Aug 2018 • Niluthpol Chowdhury Mithun, Rameswar Panda, Evangelos E. Papalexakis, Amit K. Roy-Chowdhury
Inspired by the recent success of webly supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation.
1 code implementation • ICMR 2018 • Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury
Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.
Ranked #37 on Video Retrieval on MSR-VTT
no code implementations • 9 Jun 2017 • Rameswar Panda, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury
Most video summarization approaches have focused on extracting a summary from a single video; we propose an unsupervised framework for summarizing a collection of videos.