Search Results for author: Bin Zhao

Found 52 papers, 17 papers with code

CrossMatch: Enhance Semi-Supervised Medical Image Segmentation with Perturbation Strategies and Knowledge Distillation

1 code implementation • 1 May 2024 • Bin Zhao, Chunshi Wang, Shuxue Ding

Semi-supervised learning for medical image segmentation presents a unique challenge of efficiently using limited labeled data while leveraging abundant unlabeled data.

Ranked #1 on Semi-supervised Medical Image Segmentation on ACDC 5% labeled data

Image Segmentation Self-Knowledge Distillation +2

Paper
Code

Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

1 code implementation • 30 Apr 2024 • Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li

We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing.

Offline RL Reinforcement Learning (RL) +1

Paper
Code

S3-SLAM: Sparse Tri-plane Encoding for Neural Implicit SLAM

no code implementations • 28 Apr 2024 • Zhiyao Zhang, Yunzhou Zhang, Yanmin Wu, Bin Zhao, Xingshuo Wang, Rui Tian

With the emergence of Neural Radiance Fields (NeRF), neural implicit representations have gained widespread applications across various domains, including simultaneous localization and mapping.

Simultaneous Localization and Mapping

Paper
Add Code

Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

7 code implementations • 11 Apr 2024 • Yiwen Tang, Jiaming Liu, Dong Wang, Zhigang Wang, Shanghang Zhang, Bin Zhao, Xuelong Li

The adapter incorporates prior spatial knowledge from the source modality to guide the local feature aggregation of 3D tokens, compelling the semantic adaption of any-modality transformers.

Paper
Code

HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

no code implementations • 25 Mar 2024 • Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

In this paper, we propose a novel hybrid pseudo-labeling framework for unsupervised event-based semantic segmentation, HPL-ESS, to alleviate the influence of noisy pseudo labels.

Image Reconstruction Segmentation +2

Paper
Add Code

Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning

no code implementations • 22 Feb 2024 • Haoran He, Chenjia Bai, Ling Pan, Weinan Zhang, Bin Zhao, Xuelong Li

In the fine-tuning stage, we harness the imagined future videos to guide low-level action learning trained on a limited set of robot data.

Paper
Add Code

Motion-Aware Video Frame Interpolation

no code implementations • 5 Feb 2024 • Pengfei Han, Fuhua Zhang, Bin Zhao, Xuelong Li

Subsequently, a cross-scale motion structure is presented to estimate and refine intermediate flow maps by the extracted features.

Optical Flow Estimation Video Frame Interpolation

Paper
Add Code

Vehicle Perception from Satellite

1 code implementation • 1 Feb 2024 • Bin Zhao, Pengfei Han, Xuelong Li

Satellites are capable of capturing high-resolution videos.

Density Estimation object-detection +1

Paper
Code

Calibration-free quantitative phase imaging in multi-core fiber endoscopes using end-to-end deep learning

no code implementations • 12 Dec 2023 • Jiawei Sun, Bin Zhao, Dong Wang, Zhigang Wang, Jie Zhang, Nektarios Koukourakis, Juergen W. Czarske, Xuelong Li

Quantitative phase imaging (QPI) through multi-core fibers (MCFs) has been an emerging in vivo label-free endoscopic imaging modality with minimal invasiveness.

Retrieval

Paper
Add Code

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge Transfer

no code implementations • 12 Dec 2023 • Linglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui Fang, Bin Zhao, Zhen Li

The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences.

Action Recognition Action Segmentation +6

Paper
Add Code

GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting

no code implementations • 20 Nov 2023 • Chi Yan, Delin Qu, Dan Xu, Bin Zhao, Zhigang Wang, Dong Wang, Xuelong Li

This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods.

Pose Tracking Simultaneous Localization and Mapping

Paper
Add Code

Implicit Event-RGBD Neural SLAM

no code implementations • 18 Nov 2023 • Delin Qu, Chi Yan, Dong Wang, Jie Yin, Dan Xu, Bin Zhao, Xuelong Li

To address these challenges, we propose EN-SLAM, the first event-RGBD implicit neural SLAM framework, which effectively leverages the high rate and high dynamic range advantages of event data for tracking and mapping.

Paper
Add Code

Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation

no code implementations • 13 Nov 2023 • Zhaojian Li, Bin Zhao, Yuan Yuan

To this end, a metric to measure the spatial perception of audio is proposed for the first time.

Attribute Audio Generation

Paper
Add Code

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

2 code implementations • 6 Nov 2023 • Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

Generalizable articulated object manipulation is essential for home-assistant robots.

Imitation Learning In-Context Learning +1

Paper
Code

Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models

7 code implementations • 4 Oct 2023 • Yiwen Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters.

Paper
Code

Disentangled Contrastive Image Translation for Nighttime Surveillance

no code implementations • 11 Jul 2023 • Guanzhou Lan, Bin Zhao, Xuelong Li

Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning.

Contrastive Learning Translation

Paper
Add Code

Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning

1 code implementation • NeurIPS 2023 • Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li

Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.

Reinforcement Learning (RL)

Paper
Code

On the Value of Myopic Behavior in Policy Reuse

no code implementations • 28 May 2023 • Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao, Zhen Wang, Wei Li, Xuelong Li

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence.

Paper
Add Code

Behavior Contrastive Learning for Unsupervised Skill Discovery

1 code implementation • 8 May 2023 • Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li

Under mild assumptions, our objective maximizes the MI between different behaviors based on the same skill, which serves as an upper bound of the previous MI objective.

Continuous Control Contrastive Learning

Paper
Code

One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

no code implementations • CVPR 2023 • Weichuang Li, Longhao Zhang, Dong Wang, Bin Zhao, Zhigang Wang, Mulin Chen, Bang Zhang, Zhongjian Wang, Liefeng Bo, Xuelong Li

Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image.

Neural Rendering Novel View Synthesis +1

Paper
Add Code

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

1 code implementation • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Aojun Zhou, Dong Wang, Bin Zhao, Peng Gao

The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks.

Computational Efficiency Few-Shot Learning

121

Paper
Code

Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter Correction

1 code implementation • ICCV 2023 • Delin Qu, Yizhen Lao, Zhigang Wang, Dong Wang, Bin Zhao, Xuelong Li

This paper addresses the problem of rolling shutter correction in complex nonlinear and dynamic scenes with extreme occlusion.

Rolling Shutter Correction

Paper
Code

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance

7 code implementations • 29 Mar 2023 • Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

Visual Grounding

Paper
Code

Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking

no code implementations • CVPR 2023 • Yihao Wang, Zhigang Wang, Bin Zhao, Dong Wang, Mulin Chen, Xuelong Li

In contrast, we propose a purely passive method to track a person walking in an invisible room by only observing a relay wall, which is more in line with real application scenarios, e. g., security.

Paper
Add Code

Fully Self-Supervised Depth Estimation from Defocus Clue

1 code implementation • CVPR 2023 • Haozhe Si, Bin Zhao, Dong Wang, Yunpeng Gao, Mulin Chen, Zhigang Wang, Xuelong Li

We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world.

Depth Estimation

Paper
Code

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding

no code implementations • ICCV 2023 • Zoey Guo, Yiwen Tang, Ray Zhang, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

Visual Grounding

Paper
Add Code

Low-Light Hyperspectral Image Enhancement

1 code implementation • 5 Aug 2022 • Xuelong Li, Guanlin Li, Bin Zhao

The illumination enhancement branch is adopted to enlighten the low-frequency component with reduced resolution.

Image Enhancement

Paper
Code

RCLane: Relay Chain Prediction for Lane Detection

no code implementations • 19 Jul 2022 • Shenghua Xu, Xinyue Cai, Bin Zhao, Li Zhang, Hang Xu, Yanwei Fu, xiangyang xue

This is because most of the existing lane detection methods either treat the lane detection as a dense prediction or a detection task, few of them consider the unique topologies (Y-shape, Fork-shape, nearly horizontal lane) of the lane markers, which leads to sub-optimal solution.

Lane Detection

Paper
Add Code

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training

3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao

By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.

Ranked #4 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)

3D Object Detection 3D Point Cloud Linear Classification +6

199

Paper
Code

Community detection in censored hypergraph

no code implementations • 4 Nov 2021 • Mingao Yuan, Bin Zhao, Xiaofeng Zhao

In practice, a network may has censored (or missing) values and it is shown that censored values have non-negligible effect on the structural properties of a network.

Community Detection

Paper
Add Code

Hierarchical Multimodal Transformer to Summarize Videos

no code implementations • 22 Sep 2021 • Bin Zhao, Maoguo Gong, Xuelong Li

To integrate the two kinds of information, they are encoded in a two-stream scheme, and a multimodal fusion mechanism is developed based on the hierarchical transformer.

Machine Translation Translation +2

Paper
Add Code

Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction

no code implementations • 17 Sep 2021 • Hailong Ning, Bin Zhao, Zhanxuan Hu, Lang He, Ercheng Pei

Motivated by this, an audio-visual collaborative representation learning method is proposed for the DSP task, which explores the audio modality to better predict the dynamic saliency map by assisting vision modality.

Representation Learning Saliency Prediction +1

Paper
Add Code

Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark

1 code implementation • 19 Jul 2021 • Haopeng Li, Lingbo Liu, Kunlin Yang, Shinan Liu, Junyu Gao, Bin Zhao, Rui Zhang, Jun Hou

Video crowd localization is a crucial yet challenging task, which aims to estimate exact locations of human heads in the given crowded videos.

Paper
Code

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

no code implementations • CVPR 2021 • Tianyi Zhang, Jie Lin, Peng Hu, Bin Zhao, Mohamed M. Sabry Aly

Unlike convolutions which are inherently parallel, the de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized and thus could be the performance bottleneck in convolutional object detection pipelines.

object-detection Object Detection

Paper
Add Code

EA-Net: Edge-Aware Network for Flow-based Video Frame Interpolation

no code implementations • 17 May 2021 • Bin Zhao, Xuelong Li

Specifically, in the flow estimation stage, three edge-aware mechanisms are developed to emphasize the frame edges in estimating flow maps, so that the edge-maps are taken as the auxiliary information to provide more guidance to boost the flow accuracy.

Video Frame Interpolation

Paper
Add Code

AudioVisual Video Summarization

no code implementations • 17 May 2021 • Bin Zhao, Maoguo Gong, Xuelong Li

Motivated by this, we propose to jointly exploit the audio and visual information for the video summarization task, and develop an AudioVisual Recurrent Network (AVRN) to achieve this.

Video Summarization

Paper
Add Code

Reconstructive Sequence-Graph Network for Video Summarization

no code implementations • 10 May 2021 • Bin Zhao, Haopeng Li, Xiaoqiang Lu, Xuelong Li

Then, the videos are summarized by exploiting both the local and global dependencies among shots.

Video Summarization

Paper
Add Code

Weather GAN: Multi-Domain Weather Translation Using Generative Adversarial Networks

no code implementations • 9 Mar 2021 • Xuelong Li, Kai Kou, Bin Zhao

To this end, the generator of Weather GAN is composed of an initial translation module, an attention module and a weather-cue segmentation module.

Style Transfer Translation

Paper
Add Code

Generating Masks from Boxes by Mining Spatio-Temporal Consistencies in Videos

1 code implementation • ICCV 2021 • Bin Zhao, Goutam Bhat, Martin Danelljan, Luc van Gool, Radu Timofte

This effectively limits the performance and generalization capabilities of existing video segmentation methods.

Object Segmentation +4

3,096

Paper
Code

Automated Segmentation of Brain Gray Matter Nuclei on Quantitative Susceptibility Mapping Using Deep Convolutional Neural Network

no code implementations • 3 Aug 2020 • Chao Chai, Pengchong Qiao, Bin Zhao, Huiying Wang, Guohua Liu, Hong Wu, E Mark Haacke, Wen Shen, Chen Cao, Xinchen Ye, Zhiyang Liu, Shuang Xia

Abnormal iron accumulation in the brain subcortical nuclei has been reported to be correlated to various neurodegenerative diseases, which can be measured through the magnetic susceptibility from the quantitative susceptibility mapping (QSM).

Paper
Add Code

Automatic acute ischemic stroke lesion segmentation using semi-supervised learning

no code implementations • 10 Aug 2019 • Bin Zhao, Shuxue Ding, Hong Wu, Guohua Liu, Chen Cao, Song Jin, Zhiyang Liu

By using a large number of weakly labeled subjects and a small number of fully labeled subjects, our proposed method is able to accurately detect and segment the AIS lesions.

Clustering Ischemic Stroke Lesion Segmentation +1

Paper
Add Code

Travel Time Estimation without Road Networks: An Urban Morphological Layout Representation Approach

no code implementations • 8 Jul 2019 • Wuwei Lan, Yanyan Xu, Bin Zhao

Travel time estimation is a crucial task for not only personal travel scheduling but also city planning.

Scheduling Travel Time Estimation

Paper
Add Code

C^3 Framework: An Open-source PyTorch Code for Crowd Counting

3 code implementations • 5 Jul 2019 • Junyu. Gao, Wei. Lin, Bin Zhao, Dong Wang, Chenyu Gao, Jun Wen

This technical report attempts to provide efficient and solid kits addressed on the field of crowd counting, which is denoted as Crowd Counting Code Framework (C$^3$F).

Crowd Counting

689

Paper
Code

Hierarchical Recurrent Neural Network for Video Summarization

no code implementations • 28 Apr 2019 • Bin Zhao, Xuelong. Li, Xiaoqiang Lu

Compared to traditional RNNs, H-RNN is more suitable to video summarization, since it can exploit long temporal dependency among frames, meanwhile, the computation operations are significantly lessened.

Video Captioning Video Summarization

Paper
Add Code

A General Framework for Edited Video and Raw Video Summarization

no code implementations • 24 Apr 2019 • Xuelong. Li, Bin Zhao, Xiaoqiang Lu

Besides, the property-weights are learned for edited videos and raw videos, respectively.

Video Summarization

Paper
Add Code

A CNN-RNN Architecture for Multi-Label Weather Recognition

no code implementations • 24 Apr 2019 • Bin Zhao, Xuelong. Li, Xiaoqiang Lu, Zhigang Wang

To address this problem, we make the first attempt to view weather recognition as a multi-label classification task, i. e., assigning an image more than one labels according to the displayed weather conditions.

General Classification Multi-Label Classification

Paper
Add Code

Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

no code implementations • 4 Jan 2019 • Xue Geng, Jie Fu, Bin Zhao, Jie Lin, Mohamed M. Sabry Aly, Christopher Pal, Vijay Chandrasekhar

This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage.

Quantization

Paper
Add Code

HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization

no code implementations • CVPR 2018 • Bin Zhao, Xuelong. Li, Xiaoqiang Lu

Although video summarization has achieved great success in recent years, few approaches have realized the influence of video structure on the summarization results.

Segmentation Video Summarization

Paper
Add Code

Quasi Real-Time Summarization for Consumer Videos

no code implementations • CVPR 2014 • Bin Zhao, Eric P. Xing

With the widespread availability of video cameras, we are facing an ever-growing enormous collection of unedited and unstructured video data.

Paper
Add Code

Hierarchical Feature Hashing for Fast Dimensionality Reduction

no code implementations • CVPR 2014 • Bin Zhao, Eric P. Xing

Curse of dimensionality is a practical and challenging problem in image categorization, especially in cases with a large number of classes.

Classification Dimensionality Reduction +5

Paper
Add Code

Sparse Output Coding for Large-Scale Visual Recognition

no code implementations • CVPR 2013 • Bin Zhao, Eric P. Xing

Many vision tasks require a multi-class classifier to discriminate multiple categories, on the order of hundreds or thousands.

Classification General Classification +3

Paper
Add Code

Large-Scale Category Structure Aware Image Categorization

no code implementations • NeurIPS 2011 • Bin Zhao, Fei Li, Eric P. Xing

With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available.

Image Categorization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.