Search Results for author: Ram Nevatia

Found 58 papers, 26 papers with code

SPAN: Spatial Pyramid Attention Network for Image Manipulation Localization

1 code implementation • ECCV 2020 • Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia

Tehchniques for manipulating images are advancing rapidly; while these are helpful for many useful tasks, they also pose a threat to society with their ability to create believable misinformation.

Ranked #5 on Image Manipulation Localization on Columbia

Image Manipulation Image Manipulation Detection +3

Paper
Code

GaitSTR: Gait Recognition with Sequential Two-stream Refinement

no code implementations • 2 Apr 2024 • Wanrong Zheng, Haidong Zhu, Zhaoheng Zheng, Ram Nevatia

We demonstrate that with refined skeletons, the performance of the gait recognition model can achieve further improvement on public gait recognition datasets compared with state-of-the-art methods without extra annotations.

Gait Recognition

Paper
Add Code

Large Language Models are Good Prompt Learners for Low-Shot Image Classification

1 code implementation • 7 Dec 2023 • Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu, Haidong Zhu, Ram Nevatia

Thus, we propose LLaMP, Large Language Models as Prompt learners, that produces adaptive prompts for the CLIP text encoder, establishing it as the connecting bridge.

Classification Few-Shot Image Classification +1

Paper
Code

CaesarNeRF: Calibrated Semantic Representation for Few-shot Generalizable Neural Rendering

1 code implementation • 27 Nov 2023 • Haidong Zhu, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Ram Nevatia, Luming Liang

CaesarNeRF explicitly models pose differences of reference views to combine scene-level semantic representations, providing a calibrated holistic understanding.

Few-Shot Learning Neural Rendering

Paper
Code

ShARc: Shape and Appearance Recognition for Person Identification In-the-wild

no code implementations • 24 Oct 2023 • Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia

PSE encodes the body shape via binarized silhouettes, skeleton motions, and 3-D body shape, while AAE provides two levels of temporal appearance feature aggregation: attention-based feature aggregation and averaging aggregation.

Person Identification

Paper
Add Code

ReCLIP: Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation

1 code implementation • 4 Aug 2023 • Xuefeng Hu, Ke Zhang, Lu Xia, Albert Chen, Jiajia Luo, Yuyin Sun, Ken Wang, Nan Qiao, Xiao Zeng, Min Sun, Cheng-Hao Kuo, Ram Nevatia

Large-scale Pre-Training Vision-Language Model such as CLIP has demonstrated outstanding performance in zero-shot classification, e. g. achieving 76. 3% top-1 accuracy on ImageNet without seeing any example, which leads to potential benefits to many tasks that have no labeled data.

Image Classification Language Modelling +2

Paper
Code

CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning

1 code implementation • 26 May 2023 • Zhaoheng Zheng, Haidong Zhu, Ram Nevatia

In this paper, we study the problem of Compositional Zero-Shot Learning (CZSL), which is to recognize novel attribute-object combinations with pre-existing concepts.

Ranked #1 on Compositional Zero-Shot Learning on MIT-States, generalized split

Attribute Compositional Zero-Shot Learning

Paper
Code

CAT-NeRF: Constancy-Aware Tx$^2$Former for Dynamic Body Modeling

1 code implementation • 16 Apr 2023 • Haidong Zhu, Zhaoheng Zheng, Wanrong Zheng, Ram Nevatia

This paper addresses the problem of human rendering in the video with temporal appearance constancy.

Neural Rendering

Paper
Code

GaitRef: Gait Recognition with Refined Sequential Skeletons

1 code implementation • 16 Apr 2023 • Haidong Zhu, Wanrong Zheng, Zhaoheng Zheng, Ram Nevatia

Two common modalities used for representing the walking sequence of a person are silhouettes and joint skeletons.

Ranked #3 on Multiview Gait Recognition on CASIA-B

Multiview Gait Recognition

Paper
Code

Efficient Feature Distillation for Zero-shot Annotation Object Detection

2 code implementations • 21 Mar 2023 • Zhuoming Liu, Xuefeng Hu, Ram Nevatia

We propose a new setting for detecting unseen objects called Zero-shot Annotation object Detection (ZAD).

Object object-detection +1

Paper
Code

Gait Recognition Using 3-D Human Body Shape Inference

no code implementations • 18 Dec 2022 • Haidong Zhu, Zhaoheng Zheng, Ram Nevatia

Gait recognition, which identifies individuals based on their walking patterns, is an important biometric technique since it can be observed from a distance and does not require the subject's cooperation.

Gait Identification Gait Recognition

Paper
Add Code

PatchZero: Defending against Adversarial Patch Attacks by Detecting and Zeroing the Patch

no code implementations • 5 Jul 2022 • Ke Xu, Yao Xiao, Zhaoheng Zheng, Kaijie Cai, Ram Nevatia

Despite the diversity in attack patterns, adversarial patches tend to be highly textured and different in appearance from natural images.

Image Classification object-detection +3

Paper
Add Code

MixNorm: Test-Time Adaptation Through Online Normalization Estimation

no code implementations • 21 Oct 2021 • Xuefeng Hu, Gokhan Uzunbas, Sirius Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim

We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.

Test-time Adaptation Unsupervised Domain Adaptation +1

Paper
Add Code

Testing-Time Adaptation through Online Normalization Estimation

no code implementations • 29 Sep 2021 • Xuefeng Hu, Mustafa Uzunbas, Bor-Chun Chen, Rui Wang, Ashish Shah, Ram Nevatia, Ser-Nam Lim

We present a simple and effective way to estimate the batch-norm statistics during test time, to fast adapt a source model to target test samples.

Test-time Adaptation Unsupervised Domain Adaptation +1

Paper
Add Code

SimMER: Simple Maximization of Entropy and Rank for Self-supervised Representation Learning

no code implementations • 29 Sep 2021 • Zhengyu Yang, Zijian Hu, Xuefeng Hu, Ram Nevatia

With both entropy and rank maximization, our method surpasses the state-of-the-art on CIFAR-10 and Mini-ImageNet under the standard linear evaluation protocol.

Contrastive Learning Representation Learning +1

Paper
Add Code

Improving Object Detection and Attribute Recognition by Feature Entanglement Reduction

no code implementations • 25 Aug 2021 • Zhaoheng Zheng, Arka Sadhu, Ram Nevatia

We explore object detection with two attributes: color and material.

Attribute Object +2

Paper
Add Code

Video Question Answering with Phrases via Semantic Roles

no code implementations • NAACL 2021 • Arka Sadhu, Kan Chen, Ram Nevatia

Video Question Answering (VidQA) evaluation metrics have been limited to a single-word answer or selecting a phrase from a fixed set of phrases.

Question Answering Video Question Answering

Paper
Add Code

Visual Semantic Role Labeling for Video Understanding

1 code implementation • CVPR 2021 • Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi

We propose a new framework for understanding and representing related salient events in a video using visual semantic role labeling.

Semantic Role Labeling Video Recognition +1

Paper
Code

SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification

1 code implementation • CVPR 2021 • Zijian Hu, Zhengyu Yang, Xuefeng Hu, Ram Nevatia

Combining the Pair Loss with the techniques developed by the MixMatch family, our proposed SimPLE algorithm shows significant performance gains over previous algorithms on CIFAR-100 and Mini-ImageNet, and is on par with the state-of-the-art methods on CIFAR-10 and SVHN.

Ranked #1 on Semi-Supervised Image Classification on Mini-ImageNet, 4000 Labels

Classification General Classification +3

Paper
Code

Utilizing Every Image Object for Semi-supervised Phrase Grounding

no code implementations • 5 Nov 2020 • Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia

The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training.

Phrase Grounding Referring Expression

Paper
Add Code

SPAN: Spatial Pyramid Attention Network forImage Manipulation Localization

no code implementations • 1 Sep 2020 • Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, Ram Nevatia

We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations.

Position

Paper
Add Code

Visually Grounded Continual Learning of Compositional Phrases

2 code implementations • EMNLP 2020 • Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren

To study this human-like language acquisition ability, we present VisCOLL, a visually grounded language learning task, which simulates the continual acquisition of compositional phrases from streaming visual scenes.

Continual Learning Grounded language learning +1

Paper
Code

CPARR: Category-based Proposal Analysis for Referring Relationships

no code implementations • 17 Apr 2020 • Chuanzi He, Haidong Zhu, Jiyang Gao, Kan Chen, Ram Nevatia

The task of referring relationships is to localize subject and object entities in an image satisfying a relationship query, which is given in the form of \texttt{<subject, predicate, object>}.

Object Relationship Detection +1

Paper
Add Code

Video Object Grounding using Semantic Roles in Language Description

1 code implementation • CVPR 2020 • Arka Sadhu, Kan Chen, Ram Nevatia

We explore the task of Video Object Grounding (VOG), which grounds objects in videos referred to in natural language descriptions.

Object Position

Paper
Code

Curriculum DeepSDF

1 code implementation • ECCV 2020 • Yueqi Duan, Haidong Zhu, He Wang, Li Yi, Ram Nevatia, Leonidas J. Guibas

When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions.

3D Shape Representation Representation Learning

Paper
Code

Zero-Shot Grounding of Objects from Natural Language Queries

1 code implementation • ICCV 2019 • Arka Sadhu, Kan Chen, Ram Nevatia

A phrase grounding system localizes a particular object in an image referred to by a natural language query.

Natural Language Queries object-detection +2

Paper
Code

Pose-variant 3D Facial Attribute Generation

no code implementations • 24 Jul 2019 • Feng-Ju Chang, Xiang Yu, Ram Nevatia, Manmohan Chandraker

We address the challenging problem of generating facial attributes using a single image in an unconstrained pose.

3D Reconstruction Attribute +1

Paper
Add Code

Activity Driven Weakly Supervised Object Detection

no code implementations • CVPR 2019 • Zhenheng Yang, Dhruv Mahajan, Deepti Ghadiyaram, Ram Nevatia, Vignesh Ramanathan

Weakly supervised object detection aims at reducing the amount of supervision required to train detection models.

Ranked #1 on Weakly Supervised Object Detection on Charades

Action Classification Object +2

Paper
Add Code

PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding

no code implementations • 7 Dec 2018 • Rama Kovvuri, Ram Nevatia

Phrase Grounding aims to detect and localize objects in images that are referred to and are queried by natural language phrases.

Phrase Grounding Sentence +2

Paper
Add Code

NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection

no code implementations • ICCV 2019 • JIyang Gao, Jiang Wang, Shengyang Dai, Li-Jia Li, Ram Nevatia

Comparing to standard Faster RCNN, it contains three highlights: an ensemble of two classification heads and a distillation head to avoid overfitting on noisy labels and improve the mining precision, masking the negative sample loss in box predictor to avoid the harm of false negative labels, and training box regression head only on seed annotations to eliminate the harm from inaccurate boundaries of mined bounding boxes.

Object object-detection +2

Paper
Add Code

MAC: Mining Activity Concepts for Language-based Temporal Localization

3 code implementations • 21 Nov 2018 • Runzhou Ge, Jiyang Gao, Kan Chen, Ram Nevatia

Previous methods address the problem by considering features from video sliding windows and language queries and learning a subspace to encode their correlation, which ignore rich semantic cues about activities in videos and queries.

Language-Based Temporal Localization

Paper
Code

Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic Understanding

1 code implementation • 14 Oct 2018 • Chenxu Luo, Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia, Alan Yuille

Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods.

Ranked #1 on Scene Flow Estimation on KITTI 2015 Scene Flow Training

Depth Estimation Optical Flow Estimation +2

Paper
Code

CTAP: Complementary Temporal Action Proposal Generation

1 code implementation • ECCV 2018 • Jiyang Gao, Kan Chen, Ram Nevatia

Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture "clips" or temporal intervals in videos that are likely to contain an action.

Ranked #10 on Temporal Action Proposal Generation on ActivityNet-1.3

Temporal Action Proposal Generation

Paper
Code

Every Pixel Counts: Unsupervised Geometry Learning with Holistic 3D Motion Understanding

no code implementations • 27 Jun 2018 • Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia

The four types of information, i. e. 2D flow, camera pose, segment mask and depth maps, are integrated into a differentiable holistic 3D motion parser (HMP), where per-pixel 3D motion for rigid background and moving objects are recovered.

Ranked #2 on Scene Flow Estimation on KITTI 2015 Scene Flow Training

Depth And Camera Motion Optical Flow Estimation +1

Paper
Add Code

Revisiting Temporal Modeling for Video-based Person ReID

8 code implementations • 5 May 2018 • Jiyang Gao, Ram Nevatia

Although many methods on temporal modeling have been proposed, it is hard to directly compare these methods, because the choice of feature extractor and loss function also have a large impact on the final performance.

374

Paper
Code

Motion-Appearance Co-Memory Networks for Video Question Answering

no code implementations • CVPR 2018 • Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia

Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv network to generate multi-level contextual facts; (3) a dynamic fact ensemble method to construct temporal representation dynamically for different questions.

Ranked #28 on Visual Question Answering (VQA) on MSRVTT-QA

Question Answering Video Question Answering +1

Paper
Add Code

LEGO: Learning Edge with Geometry all at Once by Watching Videos

1 code implementation • CVPR 2018 • Zhenheng Yang, Peng Wang, Yang Wang, Wei Xu, Ram Nevatia

In our framework, the predicted depths, normals and edges are forced to be consistent all the time.

Paper
Code

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

no code implementations • CVPR 2018 • Kan Chen, Jiyang Gao, Ram Nevatia

In this paper, we explore the consistency contained in both visual and language modalities, and leverage complementary external knowledge to facilitate weakly supervised grounding.

Phrase Grounding

Paper
Add Code

ExpNet: Landmark-Free, Deep, 3D Facial Expressions

1 code implementation • 2 Feb 2018 • Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni

Our ExpNet CNN is applied directly to the intensities of a face image and regresses a 29D vector of 3D expression coefficients.

Ranked #1 on 3D Facial Expression Recognition on 2017_test set (using extra training data)

3D Face Reconstruction 3D Facial Expression Recognition +2

505

Paper
Code

Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN

no code implementations • 21 Nov 2017 • Jiyang Gao, Zijian, Guo, Zhen Li, Ram Nevatia

To address these challenges, we propose a Knowledge Concentration method, which effectively transfers the knowledge from dozens of specialists (multiple teacher networks) into one single model (one student network) to classify 100K object categories.

General Classification Image Classification +1

Paper
Add Code

FacePoseNet: Making a Case for Landmark-Free Face Alignment

4 code implementations • 24 Aug 2017 • Feng-Ju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni

Instead, we compare our FPN with existing methods by evaluating how they affect face recognition accuracy on the IJB-A and IJB-B benchmarks: using the same recognition pipeline, but varying the face alignment method.

Ranked #1 on Facial Landmark Detection on 300W (Mean Error Rate metric)

3D Face Alignment Face Alignment +4

505

Paper
Code

Query-guided Regression Network with Context Policy for Phrase Grounding

no code implementations • ICCV 2017 • Kan Chen, Rama Kovvuri, Ram Nevatia

Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description.

Phrase Grounding regression

Paper
Add Code

Spatio-Temporal Action Detection with Cascade Proposal and Location Anticipation

no code implementations • 31 Jul 2017 • Zhenheng Yang, Jiyang Gao, Ram Nevatia

In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos.

Action Detection Region Proposal

Paper
Add Code

RED: Reinforced Encoder-Decoder Networks for Action Anticipation

1 code implementation • 16 Jul 2017 • Jiyang Gao, Zhenheng Yang, Ram Nevatia

RED takes multiple history representations as input and learns to anticipate a sequence of future representations.

Ranked #5 on Action Anticipation on EPIC-KITCHENS-55 (Unseen test set (S2)

Action Anticipation

Paper
Code

TALL: Temporal Activity Localization via Language Query

12 code implementations • ICCV 2017 • Jiyang Gao, Chen Sun, Zhenheng Yang, Ram Nevatia

For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.

Natural Language Queries regression +2

334

Paper
Code

Cascaded Boundary Regression for Temporal Action Detection

no code implementations • 2 May 2017 • Jiyang Gao, Zhenheng Yang, Ram Nevatia

CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows.

Ranked #6 on Temporal Action Localization on THUMOS’14 (mAP IOU@0.1 metric)

Action Detection regression

Paper
Add Code

AMC: Attention guided Multi-modal Correlation Learning for Image Search

2 code implementations • CVPR 2017 • Kan Chen, Trung Bui, Fang Chen, Zhaowen Wang, Ram Nevatia

According to the intent of query, attention mechanism can be introduced to adaptively balance the importance of different modalities.

Image Retrieval

102

Paper
Code

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals

1 code implementation • ICCV 2017 • Jiyang Gao, Zhenheng Yang, Chen Sun, Kan Chen, Ram Nevatia

Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e. g. human actions) segments from untrimmed videos is an important step for large-scale video analysis.

Ranked #8 on Action Recognition on THUMOS’14

regression Temporal Action Localization

Paper
Code

A Multi-Scale Cascade Fully Convolutional Network Face Detector

no code implementations • 12 Sep 2016 • Zhenheng Yang, Ram Nevatia

The number of proposals is decreased after each level, and the areas of regions are decreased to more precisely fit the face.

Face Detection

Paper
Add Code

Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data

no code implementations • 8 Sep 2016 • Jiyang Gao, Ram Nevatia

Besides, the action categories in such datasets are pre-defined and vocabularies are fixed.

Action Classification Classification +2

Paper
Add Code

ACD: Action Concept Discovery from Image-Sentence Corpora

no code implementations • 16 Apr 2016 • Jiyang Gao, Chen Sun, Ram Nevatia

It obtains candidate action concepts by extracting verb-object pairs from sentences and verifies their visualness with the associated images.

Action Classification Classification +2

Paper
Add Code

Face Recognition Using Deep Multi-Pose Representations

no code implementations • 23 Mar 2016 • Wael Abd-Almageed, Yue Wua, Stephen Rawlsa, Shai Harel, Tal Hassner, Iacopo Masi, Jongmoo Choi, Jatuporn Toy Leksut, Jungyeon Kim, Prem Natarajan, Ram Nevatia, Gerard Medioni

In our representation, a face image is processed by several pose-specific deep convolutional neural network (CNN) models to generate multiple pose-specific features.

Ranked #14 on Face Verification on IJB-A

Face Recognition Face Verification +1

Paper
Add Code

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

no code implementations • 18 Nov 2015 • Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, Ram Nevatia

ABC-CNN determines an attention map for an image-question pair by convolving the image feature map with configurable convolutional kernels derived from the question's semantics.

Question Answering Visual Question Answering

Paper
Add Code

ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks

no code implementations • CVPR 2016 • Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia, Lubomir Bourdev

This paper aims to classify and locate objects accurately and efficiently, without using bounding box annotations.

Ranked #5 on Weakly Supervised Object Detection on MS COCO

Classification General Classification +2

Paper
Add Code

Automatic Concept Discovery from Parallel Text and Visual Corpora

no code implementations • ICCV 2015 • Chen Sun, Chuang Gan, Ram Nevatia

Humans connect language and vision to perceive the world.

Retrieval Sentence

Paper
Add Code

Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images

1 code implementation • 4 Apr 2015 • Chen Sun, Sanketh Shetty, Rahul Sukthankar, Ram Nevatia

To solve this problem, we propose a simple yet effective method that takes weak video labels and noisy image labels as input, and generates localized action frames as output.

Action Recognition Temporal Action Localization +1

Paper
Code

DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting

no code implementations • CVPR 2014 • Chen Sun, Ram Nevatia

Our goal is to find the important segments and capture their information for event classification and recounting.

General Classification

Paper
Add Code

Efficient Detector Adaptation for Object Detection in a Video

no code implementations • CVPR 2013 • Pramod Sharma, Ram Nevatia

In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets.

Computational Efficiency Human Detection +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.