Search Results for author: Niluthpol Chowdhury Mithun

Found 14 papers, 5 papers with code

Unsupervised Domain Adaptation for Semantic Segmentation with Pseudo Label Self-Refinement

no code implementations25 Oct 2023 Xingchen Zhao, Niluthpol Chowdhury Mithun, Abhinav Rajvanshi, Han-Pang Chiu, Supun Samarasekera

Recent state-of-the-art (SOTA) UDA methods employ a teacher-student self-training approach, where a teacher model is used to generate pseudo-labels for the new data which in turn guide the training process of the student model.

Pseudo Label Semantic Segmentation +1

Cross-View Visual Geo-Localization for Outdoor Augmented Reality

no code implementations28 Mar 2023 Niluthpol Chowdhury Mithun, Kshitij Minhas, Han-Pang Chiu, Taragay Oskiper, Mikhail Sizintsev, Supun Samarasekera, Rakesh Kumar

Precise estimation of global orientation and location is critical to ensure a compelling outdoor Augmented Reality (AR) experience.

Pose Estimation

GraphMapper: Efficient Visual Navigation by Scene Graph Generation

no code implementations17 May 2022 Zachary Seymour, Niluthpol Chowdhury Mithun, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

Understanding the geometric relationships between objects in a scene is a core capability in enabling both humans and autonomous agents to navigate in new environments.

Graph Generation Navigate +2

SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments

1 code implementation26 Aug 2021 Muhammad Zubair Irshad, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments, which requires an autonomous agent to follow natural language instructions in unseen environments.

Vision and Language Navigation

Recall Loss for Imbalanced Image Classification and Semantic Segmentation

1 code implementation1 Jan 2021 Junjiao Tian, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Zsolt Kira

Many works have proposed to weigh the standard cross entropy loss function with pre-computed weights based on class statistics such as the number of samples and class margins.

Classification General Classification +4

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

1 code implementation12 Sep 2020 Niluthpol Chowdhury Mithun, Karan Sikka, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar

To enable large-scale evaluation, we introduce a new dataset containing over 550K pairs (covering 143 km^2 area) of RGB and aerial LIDAR depth images.

Visual Localization

Text-based Localization of Moments in a Video Corpus

no code implementations20 Aug 2020 Sudipta Paul, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury

This task poses a unique challenge as the system is required to perform: (i) retrieval of the relevant video where only a segment of the video corresponds with the queried sentence, and (ii) temporal localization of moment in the relevant video based on sentence query.

Moment Retrieval Retrieval +2

A Skip Connection Architecture for Localization of Image Manipulations

no code implementations IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops 2019 Ghazal Mazaheri, Niluthpol Chowdhury Mithun, Jawadul H. Bappy, Amit K. Roy-Chowdhury

In order to exploit these traces in localizing the tampered regions, we propose an encoder-decoder based network where we fuse representations from early layers in the encoder (which are richer in low-level spatial cues, like edges) by skip pooling with representations of the last layer of the decoder and use for manipulation detection.

Image Manipulation Image Manipulation Detection

Weakly Supervised Video Moment Retrieval From Text Queries

1 code implementation CVPR 2019 Niluthpol Chowdhury Mithun, Sujoy Paul, Amit K. Roy-Chowdhury

The weak nature of the supervision is because, during training, we only have access to the video-text pairs rather than the temporal extent of the video to which different text descriptions relate.

Moment Retrieval Natural Language Queries +2

Webly Supervised Joint Embedding for Cross-Modal lmage-Text Retrieval

no code implementations Proceedings of the 26th ACM international conference on Multimedia·October 2018 2018 Niluthpol Chowdhury Mithun, Rameswar Panda, Vagelis Papalexakis, Amit K. Roy-Chowdhury

Inspired by the recent success of web-supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation.

Cross-Modal Retrieval Retrieval +1

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

no code implementations23 Aug 2018 Niluthpol Chowdhury Mithun, Rameswar Panda, Evangelos E. Papalexakis, Amit K. Roy-Chowdhury

Inspired by the recent success of webly supervised learning in deep neural networks, we capitalize on readily-available web images with noisy annotations to learn robust image-text joint representation.

Cross-Modal Retrieval Retrieval +1

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval

1 code implementation ICMR 2018 Niluthpol Chowdhury Mithun, Juncheng Li, Florian Metze, Amit K. Roy-Chowdhury

Constructing a joint representation invariant across different modalities (e. g., video, language) is of significant importance in many multimedia applications.

Retrieval Text Retrieval +1

Diversity-aware Multi-Video Summarization

no code implementations9 Jun 2017 Rameswar Panda, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury

Most video summarization approaches have focused on extracting a summary from a single video; we propose an unsupervised framework for summarizing a collection of videos.

Video Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.