Search Results for author: Ram Nevatia

Found 58 papers, 26 papers with code

SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization

1 code implementation ECCV 2020 Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia

Tehchniques for manipulating images are advancing rapidly; while these are helpful for many useful tasks, they also pose a threat to society with their ability to create believable misinformation.

Image Manipulation Image Manipulation Detection +3

GaitSTR: Gait Recognition with Sequential Two-stream Refinement

no code implementations2 Apr 2024 Wanrong Zheng, Haidong Zhu, Zhaoheng Zheng, Ram Nevatia

We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.

Gait Recognition

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

1 code implementation7 Dec 2023 Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu, Haidong Zhu, Ram Nevatia

Thus, we propose LLaMP, Large Language Models as Prompt learners, that produces adaptive prompts for the CLIP text encoder, establishing it as the connecting bridge.

Classification Few-Shot Image Classification +1

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

1 code implementation27 Nov 2023 Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang

CaesarNeRF explicitly models pose differences of reference views to combine scene-level semantic representations, providing a calibrated holistic understanding.

Few-Shot Learning Neural Rendering

ShARc: Shape and Appearance Recognition for Person Identification In-the-wild

no code implementations24 Oct 2023 Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia

PSE encodes the body shape via binarized silhouettes, skeleton motions, and 3-D body shape, while AAE provides two levels of temporal appearance feature aggregation: attention-based feature aggregation and averaging aggregation.

Person Identification

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

1 code implementation4 Aug 2023 Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia

Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e. g. achieving 76. 3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data.

Image Classification Language Modelling +2

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

1 code implementation26 May 2023 Zhaoheng Zheng, Haidong Zhu, Ram Nevatia

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts.

Attribute Compositional Zero-Shot Learning

CAT-NeRF: Constancy-Aware Tx$^2$Former for Dynamic Body Modeling

1 code implementation16 Apr 2023 Haidong Zhu, Zhaoheng Zheng, Wanrong Zheng, Ram Nevatia

This paper addresses the problem of human rendering in the video with temporal appearance constancy.

Neural Rendering

GaitRef: Gait Recognition with Refined Sequential Skeletons

1 code implementation16 Apr 2023 Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia

Two common modalities used for representing the walking sequence of a person are silhouettes and joint skeletons.

Multiview Gait Recognition

Efficient Feature Distillation for Zero-shot Annotation Object Detection

2 code implementations21 Mar 2023 Zhuoming Liu, Xuefeng Hu, Ram Nevatia

We propose a new setting for detecting unseen objects called Zero-shot Annotation object Detection (ZAD).

Object object-detection +1

Gait Recognition Using 3-D Human Body Shape Inference

no code implementations18 Dec 2022 Haidong Zhu, Zhaoheng Zheng, Ram Nevatia

Gait recognition, which identifies individuals based on their walking patterns, is an important biometric technique since it can be observed from a distance and does not require the subject's cooperation.

Gait Identification Gait Recognition

PatchZero: Defending against Adversarial Patch Attacks by Detecting and Zeroing the Patch

no code implementations5 Jul 2022 Ke Xu, Yao Xiao, Zhaoheng Zheng, Kaijie Cai, Ram Nevatia

Despite the diversity in attack patterns, adversarial patches tend to be highly textured and different in appearance from natural images.

Image Classification object-detection +3

MixNorm: Test-Time Adaptation Through Online Normalization Estimation

no code implementations21 Oct 2021 Xuefeng Hu, Gokhan Uzunbas, Sirius Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim

We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.

Test-time Adaptation Unsupervised Domain Adaptation +1

Testing-Time Adaptation through Online Normalization Estimation

no code implementations29 Sep 2021 Xuefeng Hu, Mustafa Uzunbas, Bor-Chun Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim

We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.

Test-time Adaptation Unsupervised Domain Adaptation +1

SimMER: Simple Maximization of Entropy and Rank for Self-supervised Representation Learning

no code implementations29 Sep 2021 Zhengyu Yang, Zijian Hu, Xuefeng Hu, Ram Nevatia

With both entropy and rank maximization, our method surpasses the state-of-the-art on CIFAR-10 and Mini-ImageNet under the standard linear evaluation protocol.

Contrastive Learning Representation Learning +1

Video Question Answering with Phrases via Semantic Roles

no code implementations NAACL 2021 Arka Sadhu, Kan Chen, Ram Nevatia

Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases.

Question Answering Video Question Answering

Visual Semantic Role Labeling for Video Understanding

1 code implementation CVPR 2021 Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.

Semantic Role Labeling Video Recognition +1

SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification

1 code implementation CVPR 2021 Zijian Hu, Zhengyu Yang, Xuefeng Hu, Ram Nevatia

Combining the Pair Loss with the techniques developed by the MixMatch family, our proposed SimPLE algorithm shows significant performance gains over previous algorithms on CIFAR-100 and Mini-ImageNet, and is on par with the state-of-the-art methods on CIFAR-10 and SVHN.

Classification General Classification +3

Utilizing Every Image Object for Semi-supervised Phrase Grounding

no code implementations5 Nov 2020 Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia

The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training.

Phrase Grounding Referring Expression

SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization

no code implementations1 Sep 2020 Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia

We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations.

Position

Visually Grounded Continual Learning of Compositional Phrases

2 code implementations EMNLP 2020 Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren

To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes.

Continual Learning Grounded language learning +1

CPARR: Category-based Proposal Analysis for Referring Relationships

no code implementations17 Apr 2020 Chuanzi He, Haidong Zhu, Jiyang Gao, Kan Chen, Ram Nevatia

The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}.

Object Relationship Detection +1

Video Object Grounding using Semantic Roles in Language Description

1 code implementation CVPR 2020 Arka Sadhu, Kan Chen, Ram Nevatia

We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.

Object Position

Curriculum DeepSDF

1 code implementation ECCV 2020 Yueqi Duan, Haidong Zhu, He Wang, Li Yi, Ram Nevatia, Leonidas J. Guibas

When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions.

3D Shape Representation Representation Learning

Pose-variant 3D Facial Attribute Generation

no code implementations24 Jul 2019 Feng-Ju Chang, Xiang Yu, Ram Nevatia, Manmohan Chandraker

We address the challenging problem of generating facial attributes using a single image in an unconstrained pose.

3D Reconstruction Attribute +1

PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding

no code implementations7 Dec 2018 Rama Kovvuri, Ram Nevatia

Phrase Grounding aims to detect and localize objects in images that are referred to and are queried by natural language phrases.

Phrase Grounding Sentence +2

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection

no code implementations ICCV 2019 JIyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia

Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes.

Object object-detection +2

MAC: Mining Activity Concepts for Language-based Temporal Localization

3 code implementations21 Nov 2018 Runzhou Ge, Jiyang Gao, Kan Chen, Ram Nevatia

Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.

Language-Based Temporal Localization

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding

1 code implementation14 Oct 2018 Chenxu Luo, Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia, Alan Yuille

Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods.

Depth Estimation Optical Flow Estimation +2

CTAP: Complementary Temporal Action Proposal Generation

1 code implementation ECCV 2018 Jiyang Gao, Kan Chen, Ram Nevatia

Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture "clips" or temporal intervals in videos that are likely to contain an action.

Temporal Action Proposal Generation

Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding

no code implementations27 Jun 2018 Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia

The four types of information, i. e. 2D flow, camera pose, segment mask and depth maps, are integrated into a differentiable holistic 3D motion parser (HMP), where per-pixel 3D motion for rigid background and moving objects are recovered.

Depth And Camera Motion Optical Flow Estimation +1

Revisiting Temporal Modeling for Video-based Person ReID

8 code implementations5 May 2018 Jiyang Gao, Ram Nevatia

Although many methods on temporal modeling have been proposed, it is hard to directly compare these methods, because the choice of feature extractor and loss function also have a large impact on the final performance.

Motion-Appearance Co-Memory Networks for Video Question Answering

no code implementations CVPR 2018 Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia

Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions.

Question Answering Video Question Answering +1

LEGO: Learning Edge with Geometry all at Once by Watching Videos

1 code implementation CVPR 2018 Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia

In our framework, the predicted depths, normals and edges are forced to be consistent all the time.

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

no code implementations CVPR 2018 Kan Chen, Jiyang Gao, Ram Nevatia

In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding.

Phrase Grounding

ExpNet: Landmark-Free, Deep, 3D Facial Expressions

1 code implementation2 Feb 2018 Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni

Our ExpNet CNN is applied directly to the intensities of a face image and regresses a 29D vector of 3D expression coefficients.

 Ranked #1 on 3D Facial Expression Recognition on 2017_test set (using extra training data)

3D Face Reconstruction 3D Facial Expression Recognition +2

Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN

no code implementations21 Nov 2017 Jiyang Gao, Zijian, Guo, Zhen Li, Ram Nevatia

To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories.

General Classification Image Classification +1

FacePoseNet: Making a Case for Landmark-Free Face Alignment

4 code implementations24 Aug 2017 Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni

Instead, we compare our FPN with existing methods by evaluating how they affect face recognition accuracy on the IJB-A and IJB-B benchmarks: using the same recognition pipeline, but varying the face alignment method.

 Ranked #1 on Facial Landmark Detection on 300W (Mean Error Rate metric)

3D Face Alignment Face Alignment +4

Query-guided Regression Network with Context Policy for Phrase Grounding

no code implementations ICCV 2017 Kan Chen, Rama Kovvuri, Ram Nevatia

Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description.

Phrase Grounding regression

Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

no code implementations31 Jul 2017 Zhenheng Yang, Jiyang Gao, Ram Nevatia

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos.

Action Detection Region Proposal

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

1 code implementation16 Jul 2017 Jiyang Gao, Zhenheng Yang, Ram Nevatia

RED takes multiple history representations as input and learns to anticipate a sequence of future representations.

Action Anticipation

TALL: Temporal Activity Localization via Language Query

12 code implementations ICCV 2017 Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia

For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.

Natural Language Queries regression +2

Cascaded Boundary Regression for Temporal Action Detection

no code implementations2 May 2017 Jiyang Gao, Zhenheng Yang, Ram Nevatia

CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows.

Ranked #6 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)

Action Detection regression

AMC: Attention guided Multi-modal Correlation Learning for Image Search

2 code implementations CVPR 2017 Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia

According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities.

Image Retrieval

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

1 code implementation ICCV 2017 Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia

Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e. g. human actions) segments from untrimmed videos is an important step for large-scale video analysis.

regression Temporal Action Localization

A Multi-Scale Cascade Fully Convolutional Network Face Detector

no code implementations12 Sep 2016 Zhenheng Yang, Ram Nevatia

The number of proposals is decreased after each level, and the areas of regions are decreased to more precisely fit the face.

Face Detection

ACD: Action Concept Discovery from Image-Sentence Corpora

no code implementations16 Apr 2016 Jiyang Gao, Chen Sun, Ram Nevatia

It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images.

Action Classification Classification +2

Face Recognition Using Deep Multi-Pose Representations

no code implementations23 Mar 2016 Wael Abd-Almageed, Yue Wua, Stephen Rawlsa, Shai Harel, Tal Hassner, Iacopo Masi, Jongmoo Choi, Jatuporn Toy Leksut, Jungyeon Kim, Prem Natarajan, Ram Nevatia, Gerard Medioni

In our representation, a face image is processed by several pose-specific deep convolutional neural network (CNN) models to generate multiple pose-specific features.

Face Recognition Face Verification +1

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

no code implementations18 Nov 2015 Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia

ABC-CNN determines an attention map for an image-question pair by convolving the image feature map with configurable convolutional kernels derived from the question's semantics.

Question Answering Visual Question Answering

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

1 code implementation4 Apr 2015 Chen Sun, Sanketh Shetty, Rahul Sukthankar, Ram Nevatia

To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.

Action Recognition Temporal Action Localization +1

DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting

no code implementations CVPR 2014 Chen Sun, Ram Nevatia

Our goal is to find the important segments and capture their information for event classification and recounting.

General Classification

Efficient Detector Adaptation for Object Detection in a Video

no code implementations CVPR 2013 Pramod Sharma, Ram Nevatia

In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets.

Computational Efficiency Human Detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.