Search Results for author: Sijie Zhu

Found 22 papers, 13 papers with code

Edit3K: Universal Representation Learning for Video Editing Components

no code implementations24 Mar 2024 Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, YuFei Wang, Tiejian Luo, Sijie Zhu

Each video in our dataset is rendered by various image/video materials with a single editing component, which supports atomic visual understanding of different editing components.

Representation Learning Retrieval +1

TopNet: Transformer-based Object Placement Network for Image Compositing

no code implementations CVPR 2023 Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen

Given a background image and a segmented object, the goal is to train a model to predict plausible placements (location and scale) of the object for compositing.

Object

$R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place Recognition

no code implementations6 Apr 2023 Sijie Zhu, Linjie Yang, Chen Chen, Mubarak Shah, Xiaohui Shen, Heng Wang

Visual Place Recognition (VPR) estimates the location of query images by matching them with images in a reference database.

Feature Correlation Retrieval +1

GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing

no code implementations31 Mar 2022 Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen

To move a step further, this paper proposes GALA (Geometry-and-Lighting-Aware), a generic foreground object search method with discriminative modeling on geometry and lighting compatibility for open-world image compositing.

Object

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations

1 code implementation14 May 2021 Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen

MutualNet is a general training methodology that can be applied to various network structures (e. g., 2D networks: MobileNets, ResNet, 3D networks: SlowFast, X3D) and various tasks (e. g., image classification, object detection, segmentation, and action recognition), and is demonstrated to achieve consistent improvements on a variety of datasets.

Action Recognition Image Classification +2

3D Human Pose Estimation with Spatial and Temporal Transformers

3 code implementations ICCV 2021 Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding

Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation.

Image Classification Monocular 3D Human Pose Estimation +3

Consistency-based Active Learning for Object Detection

1 code implementation18 Mar 2021 Weiping Yu, Sijie Zhu, Taojiannan Yang, Chen Chen

Unlike most recent works that focused on applying active learning for image classification, we propose an effective Consistency-based Active Learning method for object Detection (CALD), which fully explores the consistency between original and augmented data.

Active Learning Classification +5

A3D: Adaptive 3D Networks for Video Action Recognition

no code implementations24 Nov 2020 Sijie Zhu, Taojiannan Yang, Matias Mendieta, Chen Chen

Even under the same computational constraints, the performance of our adaptive networks can be significantly boosted over the baseline counterparts by the mutual training along three dimensions.

Action Recognition Temporal Action Localization

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

1 code implementation CVPR 2021 Sijie Zhu, Taojiannan Yang, Chen Chen

In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge.

Image-Based Localization Image Retrieval

Efficient Deep Learning of Non-local Features for Hyperspectral Image Classification

1 code implementation2 Aug 2020 Yu Shen, Sijie Zhu, Chen Chen, Qian Du, Liang Xiao, Jianyu Chen, Delu Pan

Therefore, to incorporate the long-range contextual information, a deep fully convolutional network (FCN) with an efficient non-local module, named ENL-FCN, is proposed for HSI classification.

General Classification Hyperspectral Image Classification

GradAug: A New Regularization Method for Deep Neural Networks

1 code implementation NeurIPS 2020 Taojiannan Yang, Sijie Zhu, Chen Chen

The key idea is utilizing randomly transformed training samples to regularize a set of sub-networks, which are originated by sampling the width of the original network, in the training process.

Instance Segmentation object-detection +2

Revisiting Street-to-Aerial View Image Geo-localization and Orientation Estimation

no code implementations23 May 2020 Sijie Zhu, Taojiannan Yang, Chen Chen

Street-to-aerial image geo-localization, which matches a query street-view image to the GPS-tagged aerial images in a reference set, has attracted increasing attention recently.

Metric Learning

Density Map Guided Object Detection in Aerial Images

1 code implementation12 Apr 2020 Changlin Li, Taojiannan Yang, Sijie Zhu, Chen Chen, Shanyue Guan

Specifically, we propose a Density-Map guided object detection Network (DMNet), which is inspired from the observation that the object density map of an image presents how objects distribute in terms of the pixel intensity of the map.

Image Cropping Object +3

Video Anomaly Detection for Smart Surveillance

no code implementations1 Apr 2020 Sijie Zhu, Chen Chen, Waqas Sultani

Temporal localization (i. e. indicating the start and end frames of the anomaly event in a video) is referred to as frame-level detection.

Anomaly Detection Temporal Localization +1

Visual Explanation for Deep Metric Learning

1 code implementation27 Sep 2019 Sijie Zhu, Taojiannan Yang, Chen Chen

This work explores the visual explanation for deep metric learning and its applications.

Metric Learning Retrieval

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

2 code implementations ECCV 2020 Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis

We propose the width-resolution mutual learning method (MutualNet) to train a network that is executable at dynamic resource constraints to achieve adaptive accuracy-efficiency trade-offs at runtime.

Instance Segmentation object-detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.