Search Results for author: Siyu Zhu

Found 47 papers, 18 papers with code

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition

no code implementations • 3 Apr 2024 • Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, QiXing Huang

This paper enables high-fidelity, transferable NeRF editing by frequency decomposition.

Paper
Add Code

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

no code implementations • 22 Mar 2024 • Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao, Yao Yao

Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation.

3D Generation

Paper
Add Code

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

1 code implementation • 21 Mar 2024 • Shenhao Zhu, Junming Leo Chen, Zuozhuo Dai, Yinghui Xu, Xun Cao, Yao Yao, Hao Zhu, Siyu Zhu

In this study, we introduce a methodology for human image animation by leveraging a 3D human parametric model within a latent diffusion framework to enhance shape alignment and motion guidance in curernt human generative techniques.

Animated GIF Generation Image Animation +1

3,091

Paper
Code

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation

no code implementations • 19 Mar 2024 • Junhao Cai, Yisheng He, Weihao Yuan, Siyu Zhu, Zilong Dong, Liefeng Bo, Qifeng Chen

Derived from OmniObject3D, OO3D-9D is the largest and most diverse dataset in the field of category-level object pose and size estimation.

Object

Paper
Add Code

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model

no code implementations • 18 Mar 2024 • Qi Zuo, Xiaodong Gu, Lingteng Qiu, Yuan Dong, Zhengyi Zhao, Weihao Yuan, Rui Peng, Siyu Zhu, Zilong Dong, Liefeng Bo, QiXing Huang

Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency.

Denoising

Paper
Add Code

LiRank: Industrial Large Scale Ranking Models at LinkedIn

no code implementations • 10 Feb 2024 • Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan, Sirou Zhu, Zhiwei Wang, Neil Daftary, Qianqi Shen, Chengming Jiang, Haichao Wei, Maneesh Varshney, Amol Ghoting, Souvik Ghosh

We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods.

Click-Through Rate Prediction Quantization

Paper
Add Code

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle

no code implementations • 6 Dec 2023 • Youtian Lin, Zuozhuo Dai, Siyu Zhu, Yao Yao

Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction.

3D Reconstruction 4D reconstruction +1

Paper
Add Code

AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance

1 code implementation • 21 Nov 2023 • Zuozhuo Dai, Zhenghao Zhang, Yao Yao, Bingxue Qiu, Siyu Zhu, Long Qin, Weizhi Wang

Image animation is a key task in computer vision which aims to generate dynamic visual content from static image.

Image Animation Image to Video Generation

544

Paper
Code

Improving Adversarial Transferability by Stable Diffusion

no code implementations • 18 Nov 2023 • Jiayang Liu, Siyu Zhu, Siyuan Liang, Jie Zhang, Han Fang, Weiming Zhang, Ee-Chien Chang

Various techniques have emerged to enhance the transferability of adversarial attacks for the black-box scenario.

Paper
Add Code

QuantEase: Optimization-based Quantization for Language Models

no code implementations • 5 Sep 2023 • Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity.

Quantization

Paper
Add Code

Fine-grained Text-Video Retrieval with Frozen Image Encoders

no code implementations • 14 Jul 2023 • Zuozhuo Dai, Fangtao Shao, Qingkun Su, Zilong Dong, Siyu Zhu

In the second stage, we propose a novel decoupled video text cross attention module to capture fine-grained multimodal information in spatial and temporal dimensions.

Retrieval Video Retrieval

Paper
Add Code

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model

no code implementations • 22 May 2023 • Zhenghao Zhang, Zhichao Wei, Shengfan Zhang, Zuozhuo Dai, Siyu Zhu

Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of available datasets.

Image Segmentation Object +5

Paper
Add Code

3D Former: Monocular Scene Reconstruction with 3D SDF Transformers

1 code implementation • 31 Jan 2023 • Weihao Yuan, Xiaodong Gu, Heng Li, Zilong Dong, Siyu Zhu

In this work, we propose an SDF transformer network, which replaces the role of 3D CNN for better 3D feature aggregation.

Paper
Code

Towards Robust Video Instance Segmentation with Temporal-Aware Transformer

no code implementations • 20 Jan 2023 • Zhenghao Zhang, Fangtao Shao, Zuozhuo Dai, Siyu Zhu

In this paper, we observe the temporal information is important as well and we propose TAFormer to aggregate spatio-temporal features both in transformer encoder and decoder.

Instance Segmentation Semantic Segmentation +1

Paper
Add Code

Linguistic Query-Guided Mask Generation for Referring Image Segmentation

no code implementations • 16 Jan 2023 • Zhichao Wei, Xiaohao Chen, Mingqiang Chen, Siyu Zhu

Referring image segmentation aims to segment the image region of interest according to the given language expression, which is a typical multi-modal task.

Contrastive Learning Image Segmentation +2

Paper
Add Code

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

no code implementations • 26 Jul 2022 • Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated.

Image Retrieval Retrieval +1

Paper
Add Code

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds

no code implementations • 23 May 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.

Motion Estimation Point Cloud Registration +1

Paper
Add Code

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

1 code implementation • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

While recent works design increasingly complicated and powerful networks to directly regress the depth map, we take the path of CRFs optimization.

Ranked #1 on Depth Prediction on Matterport3D

Depth Prediction Monocular Depth Estimation

350

Paper
Code

QuadTree Attention for Vision Transformers

1 code implementation • ICLR 2022 • Shitao Tang, Jiahui Zhang, Siyu Zhu, Ping Tan

Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency.

object-detection Object Detection +2

322

Paper
Code

Neural Window Fully-Connected CRFs for Monocular Depth Estimation

no code implementations • CVPR 2022 • Weihao Yuan, Xiaodong Gu, Zuozhuo Dai, Siyu Zhu, Ping Tan

Estimating the accurate depth from a single image is challenging since it is inherently ambiguous and ill-posed.

Monocular Depth Estimation

Paper
Add Code

RCP: Recurrent Closest Point for Point Cloud

1 code implementation • CVPR 2022 • Xiaodong Gu, Chengzhou Tang, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Ping Tan

In the experiments, we evaluate the proposed method on both the 3D scene flow estimation and the point cloud registration task.

Motion Estimation Point Cloud Registration +1

Paper
Code

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

no code implementations • 22 Nov 2021 • Lizhe Liu, Mingqiang Chen, Xiaohao Chen, Siyu Zhu, Ping Tan

Our GB-CosFace introduces an adaptive global boundary to determine whether two face samples belong to the same identity so that the optimization objective is aligned with the testing process from the perspective of open set classification.

Classification Face Recognition +2

Paper
Add Code

FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting

no code implementations • ICCV 2021 • Zhiwen Fan, Lingjie Zhu, Honghua Li, Xiaohao Chen, Siyu Zhu, Ping Tan

The proposed CNN-GCN method achieved state-of-the-art (SOTA) performance on the task of semantic symbol spotting, and help us build a baseline network for the panoptic symbol spotting task.

Vector Graphics

Paper
Add Code

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution

3 code implementations • ICCV 2021 • Lizhe Liu, Xiaohao Chen, Siyu Zhu, Ping Tan

Modern deep-learning-based lane detection methods are successful in most scenarios but struggling for lane lines with complex topologies.

Ranked #8 on Lane Detection on CurveLanes (using extra training data)

Lane Detection

530

Paper
Code

Stereo Matching by Self-supervision of Multiscopic Vision

no code implementations • 9 Apr 2021 • Weihao Yuan, Yazhan Zhang, Bingkun Wu, Siyu Zhu, Ping Tan, Michael Yu Wang, Qifeng Chen

Self-supervised learning for depth estimation possesses several advantages over supervised learning.

Depth Estimation Self-Supervised Learning +1

Paper
Add Code

Learning Camera Localization via Dense Scene Matching

1 code implementation • CVPR 2021 • Shitao Tang, Chengzhou Tang, Rui Huang, Siyu Zhu, Ping Tan

We present a new method for scene agnostic camera localization using dense scene matching (DSM), where a cost volume is constructed between a query image and a scene.

Camera Localization

Paper
Code

AR Mapping: Accurate and Efficient Mapping for Augmented Reality

no code implementations • 27 Mar 2021 • Rui Huang, Chuan Fang, Kejie Qiu, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Secondly, we propose an AR mapping pipeline which takes the input from the scanning device and produces accurate AR Maps.

Paper
Add Code

DRO: Deep Recurrent Optimizer for Video to Depth

1 code implementation • 24 Mar 2021 • Xiaodong Gu, Weihao Yuan, Zuozhuo Dai, Siyu Zhu, Chengzhou Tang, Zilong Dong, Ping Tan

There are increasing interests of studying the video-to-depth (V2D) problem with machine learning techniques.

Paper
Code

Cluster Contrast for Unsupervised Person Re-Identification

3 code implementations • 22 Mar 2021 • Zuozhuo Dai, Guangyuan Wang, Weihao Yuan, Xiaoli Liu, Siyu Zhu, Ping Tan

Thus, our method can solve the problem of cluster inconsistency and be applicable to larger data sets.

Ranked #1 on Unsupervised Person Re-Identification on PersonX

Clustering Unsupervised Domain Adaptation +2

209

Paper
Code

UniFuse: Unidirectional Fusion for 360$^{\circ}$ Panorama Depth Estimation

1 code implementation • 6 Feb 2021 • Hualie Jiang, Zhe Sheng, Siyu Zhu, Zilong Dong, Rui Huang

Besides, we also designed a more effective fusion module for our fusion scheme.

Ranked #1 on Depth Estimation on Matterport3D

Depth Estimation

Paper
Code

MeshMVS: Multi-View Stereo Guided Mesh Reconstruction

no code implementations • 17 Oct 2020 • Rakesh Shrestha, Zhiwen Fan, Qingkun Su, Zuozhuo Dai, Siyu Zhu, Ping Tan

Deep learning based 3D shape generation methods generally utilize latent features extracted from color images to encode the semantics of objects and guide the shape generation process.

3D Shape Generation

Paper
Add Code

Self-Supervised Human Depth Estimation from Monocular Videos

1 code implementation • CVPR 2020 • Feitong Tan, Hao Zhu, Zhaopeng Cui, Siyu Zhu, Marc Pollefeys, Ping Tan

Previous methods on estimating detailed human depth often require supervised training with `ground truth' depth data.

Depth Estimation Self-Supervised Learning

Paper
Code

End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds

1 code implementation • CVPR 2020 • Lei Li, Siyu Zhu, Hongbo Fu, Ping Tan, Chiew-Lan Tai

In this work, we propose an end-to-end framework to learn local multi-view descriptors for 3D point clouds.

Ranked #5 on Point Cloud Registration on 3DMatch Benchmark

Point Cloud Registration

Paper
Code

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching

4 code implementations • CVPR 2020 • Xiaodong Gu, Zhiwen Fan, Zuozhuo Dai, Siyu Zhu, Feitong Tan, Ping Tan

The deep multi-view stereo (MVS) and stereo matching approaches generally construct 3D cost volumes to regularize and regress the output depth or disparity.

Ranked #12 on Point Clouds on Tanks and Temples

3D Reconstruction Point Clouds +1

667

Paper
Code

A Neural Network for Detailed Human Depth Estimation from a Single Image

1 code implementation • ICCV 2019 • Sicong Tang, Feitong Tan, Kelvin Cheng, Zhaoyang Li, Siyu Zhu, Ping Tan

To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively.

Depth Estimation

Paper
Code

Matchable Image Retrieval by Learning from Surface Reconstruction

1 code implementation • 26 Nov 2018 • Tianwei Shen, Zixin Luo, Lei Zhou, Runze Zhang, Siyu Zhu, Tian Fang, Long Quan

Convolutional Neural Networks (CNNs) have achieved superior performance on object image retrieval, while Bag-of-Words (BoW) models with handcrafted local features still dominate the retrieval of overlapping images in 3D reconstruction.

3D Reconstruction Image Retrieval +2

Paper
Code

Batch DropBlock Network for Person Re-identification and Beyond

5 code implementations • ICCV 2019 • Zuozhuo Dai, Mingqiang Chen, Xiaodong Gu, Siyu Zhu, Ping Tan

In this paper, we propose the Batch DropBlock (BDB) Network which is a two branch network composed of a conventional ResNet-50 as the global branch and a feature dropping branch.

Ranked #8 on Person Re-Identification on Market-1501-C

Image Retrieval Metric Learning +1

326

Paper
Code

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints

1 code implementation • ECCV 2018 • Zixin Luo, Tianwei Shen, Lei Zhou, Siyu Zhu, Runze Zhang, Yao Yao, Tian Fang, Long Quan

Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction.

3D Reconstruction

191

Paper
Code

Learning and Matching Multi-View Descriptors for Registration of Point Clouds

no code implementations • ECCV 2018 • Lei Zhou, Siyu Zhu, Zixin Luo, Tianwei Shen, Runze Zhang, Mingmin Zhen, Tian Fang, Long Quan

Critical to the registration of point clouds is the establishment of a set of accurate correspondences between points in 3D space.

Paper
Add Code

Very Large-Scale Global SfM by Distributed Motion Averaging

no code implementations • CVPR 2018 • Siyu Zhu, Runze Zhang, Lei Zhou, Tianwei Shen, Tian Fang, Ping Tan, Long Quan

This work proposes a divide-and-conquer framework to solve very large global SfM at the scale of millions of images.

Paper
Add Code

Progressive Large Scale-Invariant Image Matching in Scale Space

no code implementations • ICCV 2017 • Lei Zhou, Siyu Zhu, Tianwei Shen, Jinglu Wang, Tian Fang, Long Quan

In this paper, we propose a scale-invariant image matching approach to tackling the very large scale variation of views.

Image Retrieval Retrieval

Paper
Add Code

Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus

no code implementations • ICCV 2017 • Runze Zhang, Siyu Zhu, Tian Fang, Long Quan

In this paper, we propose a distributed approach to coping with this global bundle adjustment for very large scale Structure-from-Motion computation.

Distributed Computing Distributed Optimization

Paper
Add Code

Parallel Structure from Motion from Local Increment to Global Averaging

no code implementations • 28 Feb 2017 • Siyu Zhu, Tianwei Shen, Lei Zhou, Runze Zhang, Jinglu Wang, Tian Fang, Long Quan

In this paper, we tackle the accurate and consistent Structure from Motion (SfM) problem, in particular camera registration, far exceeding the memory of a single computer in parallel.

Clustering

Paper
Add Code

A Text Detection System for Natural Scenes With Convolutional Feature Learning and Cascaded Classification

no code implementations • CVPR 2016 • Siyu Zhu, Richard Zanibbi

We propose a system that finds text in natural scenes using a variety of cues.

General Classification Text Detection

Paper
Add Code

Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-View Stereo

no code implementations • ICCV 2015 • Runze Zhang, Shiwei Li, Tian Fang, Siyu Zhu, Long Quan

To solve this problem, we propose a joint optimization in a hierarchical framework to obtain the final surface segments and corresponding optimal camera clusters.

Clustering Segmentation

Paper
Add Code

Detecting Figures and Part Labels in Patents: Competition-Based Development of Image Processing Algorithms

no code implementations • 24 Oct 2014 • Christoph Riedl, Richard Zanibbi, Marti A. Hearst, Siyu Zhu, Michael Menietti, Jason Crusan, Ivan Metelsky, Karim R. Lakhani

We report the findings of a month-long online competition in which participants developed algorithms for augmenting the digital version of patent documents published by the United States Patent and Trademark Office (USPTO).

Paper
Add Code

Local Readjustment for High-Resolution 3D Reconstruction

no code implementations • CVPR 2014 • Siyu Zhu, Tian Fang, Jianxiong Xiao, Long Quan

To this end, we propose a segment-based approach to readjust the camera poses locally and improve the reconstruction for fine geometry details.

3D Reconstruction Vocal Bursts Intensity Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.