Search Results for author: Hao Li

Found 329 papers, 134 papers with code

PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

1 code implementation • 23 Apr 2024 • Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

(3) Corrective learning.

Paper
Code

FiLo: Zero-Shot Anomaly Detection by Fine-Grained Description and High-Quality Localization

no code implementations • 21 Apr 2024 • Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Hao Li, Ming Tang, Jinqiao Wang

Zero-shot anomaly detection (ZSAD) methods entail detecting anomalies directly without access to any known normal or abnormal samples within the target item categories.

Paper
Add Code

Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics

no code implementations • 14 Apr 2024 • Haosong Peng, Wei Feng, Hao Li, Yufeng Zhan, Qihua Zhou, Yuanqing Xia

In this paper, we find visual foundation models like Vision Transformer (ViT) also have a dedicated acceleration mechanism for video analytics.

Edge-computing

Paper
Add Code

Fuxi-DA: A Generalized Deep Learning Data Assimilation Framework for Assimilating Satellite Observations

no code implementations • 12 Apr 2024 • Xiaoze Xu, Xiuyu Sun, Wei Han, Xiaohui Zhong, Lei Chen, Hao Li

Data assimilation (DA), as an indispensable component within contemporary Numerical Weather Prediction (NWP) systems, plays a crucial role in generating the analysis that significantly impacts forecast performance.

Weather Forecasting

Paper
Add Code

360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

no code implementations • 8 Apr 2024 • Shen Gao, Hao Li, Zhengliang Shi, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang

The framework employs a novel 360{\deg} performance assessment method for multi-perspective performance evaluation with fine-grained assessment.

Language Modelling Large Language Model

Paper
Add Code

Multi-level Graph Subspace Contrastive Learning for Hyperspectral Image Clustering

no code implementations • 8 Apr 2024 • Jingxin Wang, Renxiang Guan, Kainan Gao, Zihao Li, Hao Li, Xianju Li, Chang Tang

Multi-level graph subspace contrastive learning: multi-level contrastive learning was conducted to obtain local-global joint graph representations, to improve the consistency of the positive samples between views, and to obtain more robust graph embeddings.

Clustering Contrastive Learning +1

Paper
Add Code

Collaborative Feedback Discriminative Propagation for Video Super-Resolution

1 code implementation • 6 Apr 2024 • Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan

However, inaccurate alignment usually leads to aligned features with significant artifacts, which will be accumulated during propagation and thus affect video restoration.

Video Reconstruction Video Restoration +1

Paper
Code

On the Scalability of Diffusion-based Text-to-Image Generation

no code implementations • 3 Apr 2024 • Hao Li, Yang Zou, Ying Wang, Orchid Majumder, Yusheng Xie, R. Manmatha, Ashwin Swaminathan, Zhuowen Tu, Stefano Ermon, Stefano Soatto

On the data scaling side, we show the quality and diversity of the training set matters more than simply dataset size.

Denoising Text-to-Image Generation

Paper
Add Code

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

no code implementations • 2 Apr 2024 • Sicheng Li, Hao Li, Yiyi Liao, Lu Yu

The emergence of Neural Radiance Fields (NeRF) has greatly impacted 3D scene modeling and novel-view synthesis.

Feature Compression Novel View Synthesis +1

Paper
Add Code

The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

no code implementations • 1 Apr 2024 • Jun Lyu, Chen Qin, Shuo Wang, Fanwen Wang, Yan Li, Zi Wang, Kunyuan Guo, Cheng Ouyang, Michael Tänzer, Meng Liu, Longyu Sun, Mengting Sun, Qin Li, Zhang Shi, Sha Hua, Hao Li, Zhensen Chen, Zhenlin Zhang, Bingyu Xin, Dimitris N. Metaxas, George Yiasemis, Jonas Teuwen, Liping Zhang, Weitian Chen, Yidong Zhao, Qian Tao, Yanwei Pang, Xiaohan Liu, Artem Razumov, Dmitry V. Dylov, Quan Dou, Kang Yan, Yuyang Xue, Yuning Du, Julia Dietlmeier, Carles Garcia-Cabrera, Ziad Al-Haj Hemidi, Nora Vogt, Ziqiang Xu, Yajing Zhang, Ying-Hua Chu, Weibo Chen, Wenjia Bai, Xiahai Zhuang, Jing Qin, Lianmin Wu, Guang Yang, Xiaobo Qu, He Wang, Chengyan Wang

To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI.

MRI Reconstruction

Paper
Add Code

From Two-Stream to One-Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation

no code implementations • 25 Mar 2024 • Yang Luo, Xiqing Guo, Hao Li

Due to the complementary nature of visible light and thermal infrared modalities, object tracking based on the fusion of visible light images and thermal images (referred to as RGB-T tracking) has received increasing attention from researchers in recent years.

Knowledge Distillation Object Tracking +1

Paper
Add Code

TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

no code implementations • 20 Mar 2024 • Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention.

Keyword Spotting

Paper
Add Code

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

no code implementations • 15 Mar 2024 • Hao Li, Yuanyuan Gao, Chenming Wu, Dingwen Zhang, Yalun Dai, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model.

Generalizable Novel View Synthesis Novel View Synthesis

Paper
Add Code

LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection

1 code implementation • 14 Mar 2024 • Xiangrui Cai, Yang Wang, Sihan Xu, Hao Li, Ying Zhang, Zheli Liu, Xiaojie Yuan

Moreover, LAN can be also applied to post-hoc ITD, surpassing 8 competitive baselines by at least 7. 70% and 4. 03% in AUC on two datasets.

Anomaly Detection Graph structure learning

Paper
Code

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

no code implementations • 9 Mar 2024 • QiHao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu

Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories.

Paper
Add Code

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

no code implementations • 8 Mar 2024 • Junyan Wang, Zhenhong Sun, Zhiyu Tan, Xuanbai Chen, Weihua Chen, Hao Li, Cheng Zhang, Yang song

Vanilla text-to-image diffusion models struggle with generating accurate human images, commonly resulting in imperfect anatomies such as unnatural postures or disproportionate limbs. Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls -- human-centric priors such as pose or depth maps -- during the image generation phase.

Image Generation

Paper
Add Code

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

1 code implementation • 29 Feb 2024 • Hao Li, Ying Chen, Yifei Chen, Wenxian Yang, Bowen Ding, Yuchen Han, Liansheng Wang, Rongshan Yu

It is designed to enhance the model's generalizability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics.

Image Classification Language Modelling +3

Paper
Code

Improving LLM-based Machine Translation with Systematic Self-Correction

1 code implementation • 26 Feb 2024 • Zhaopeng Feng, Yan Zhang, Hao Li, Wenqiang Liu, Jun Lang, Yang Feng, Jian Wu, Zuozhu Liu

Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT).

Machine Translation Translation

Paper
Code

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

no code implementations • 25 Feb 2024 • Hao Wang, Hao Li, Minlie Huang, Lei Sha

The safety defense methods of Large language models(LLMs) stays limited because the dangerous prompts are manually curated to just few known attack types, which fails to keep pace with emerging varieties.

Language Modelling Large Language Model

Paper
Add Code

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

no code implementations • 22 Feb 2024 • Hao Li, Mengqi Huang, Lei Zhang, Bo Hu, Yi Liu, Zhendong Mao

GAN-based image attribute editing firstly leverages GAN Inversion to project real images into the latent space of GAN and then manipulates corresponding latent codes.

Attribute

Paper
Add Code

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

no code implementations • 20 Feb 2024 • Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Wenhu Chen, Jie Fu

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.

Instruction Following

Paper
Add Code

Phrase Grounding-based Style Transfer for Single-Domain Generalized Object Detection

no code implementations • 2 Feb 2024 • Hao Li, Wei Wang, Cong Wang, Zhigang Luo, Xinwang Liu, Kenli Li, Xiaochun Cao

Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains using only data from a single source domain during training.

object-detection Object Detection +2

Paper
Add Code

MOSformer: Momentum encoder-based inter-slice fusion transformer for medical image segmentation

no code implementations • 22 Jan 2024 • De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Xiu-Ling Liu, Zeng-Guang Hou

2. 5D-based segmentation models bridge computational efficiency of 2D-based models and spatial perception capabilities of 3D-based models.

Computational Efficiency Image Segmentation +3

Paper
Add Code

Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation

1 code implementation • 18 Jan 2024 • Zesen Cheng, Kehan Li, Hao Li, Peng Jin, Chang Liu, Xiawu Zheng, Rongrong Ji, Jie Chen

To mold instance queries to follow Brownian bridge and accomplish alignment with class texts, we design Bridge-Text Alignment (BTA) to learn discriminative bridge-level representations of instances via contrastive objectives.

Instance Segmentation Semantic Segmentation +1

Paper
Code

Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting

1 code implementation • 18 Jan 2024 • Hao Li, Gopi Krishnan Rajbahadur, Dayi Lin, Cor-Paul Bezemer, Zhen Ming, Jiang

This classifier is then used to detect if a trained model is overfit.

Paper
Code

Hierarchical Fashion Design with Multi-stage Diffusion Models

no code implementations • 15 Jan 2024 • Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Ying Cao

Cross-modal fashion synthesis and editing offer intelligent support to fashion designers by enabling the automatic generation and local modification of design drafts. While current diffusion models demonstrate commendable stability and controllability in image synthesis, they still face significant challenges in generating fashion design from abstract design elements and fine-grained editing. Abstract sensory expressions, \eg office, business, and party, form the high-level design concepts, while measurable aspects like sleeve length, collar type, and pant length are considered the low-level attributes of clothing. Controlling and editing fashion images using lengthy text descriptions poses a difficulty. In this paper, we propose HieraFashDiff, a novel fashion design method using the shared multi-stage diffusion model encompassing high-level design concepts and low-level clothing attributes in a hierarchical structure. Specifically, we categorized the input text into different levels and fed them in different time step to the diffusion model according to the criteria of professional clothing designers. HieraFashDiff allows designers to add low-level attributes after high-level prompts for interactive editing incrementally. In addition, we design a differentiable loss function in the sampling process with a mask to keep non-edit areas. Comprehensive experiments performed on our newly conducted Hierarchical fashion dataset, demonstrate that our proposed method outperforms other state-of-the-art competitors.

Fashion Synthesis Image Generation

Paper
Add Code

Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech

no code implementations • 12 Jan 2024 • Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu

Furthermore, experiments on the continuous speech dataset LibriSpeech demonstrate that, by incorporating audio discrimination, CLAD achieves significant performance gain over CL without audio discrimination.

Contrastive Learning Keyword Spotting +1

Paper
Add Code

Risk-anticipatory autonomous driving strategies considering vehicles' weights, based on hierarchical deep reinforcement learning

no code implementations • 27 Dec 2023 • Di Chen, Hao Li, Zhicheng Jin, Huizhao Tu

Autonomous vehicles (AVs) have the potential to prevent accidents caused by drivers' error and reduce road traffic risks.

Autonomous Driving Decision Making

Paper
Add Code

Coordinated Planning of Offshore Charging Stations and Electrified Ships: A Case Study on Shanghai-Busan Maritime Route

no code implementations • 25 Dec 2023 • Hao Li, Hanqi Tao, Wentao Huang, Hongcai Zhang, Ran Li

Despite the success of electric vehicles on land, electrification of maritime ships is challenged by the dilemma of range anxiety and cargo-carrying capacity.

Paper
Add Code

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

1 code implementation • 22 Dec 2023 • Qianrui Zhou, Hua Xu, Hao Li, Hanlei Zhang, Xiaohan Zhang, Yifan Wang, Kai Gao

To establish an optimal multimodal semantic environment for text modality, we develop a modality-aware prompting module (MAP), which effectively aligns and fuses features from text, video and audio modalities with similarity-based modality alignment and cross-modality attention mechanism.

Ranked #2 on Multimodal Intent Recognition on MIntRec

Contrastive Learning Multimodal Intent Recognition

Paper
Code

FuXi-S2S: An accurate machine learning model for global subseasonal forecasts

no code implementations • 15 Dec 2023 • Lei Chen, Xiaohui Zhong, Jie Wu, Deliang Chen, Shangping Xie, Qingchen Chao, Chensen Lin, Zixin Hu, Bo Lu, Hao Li, Yuan Qi

Skillful subseasonal forecasts beyond 2 weeks are crucial for a wide range of applications across various sectors of society.

Weather Forecasting

Paper
Add Code

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

no code implementations • 14 Dec 2023 • Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai

Many reinforcement learning environments (e. g., Minecraft) provide only sparse rewards that indicate task completion or failure with binary values.

reinforcement-learning

Paper
Add Code

Diffusion-based Blind Text Image Super-Resolution

no code implementations • 13 Dec 2023 • Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian

Since text prior is important to guarantee the correctness of the restored text structure according to existing arts, we also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures.

Image Generation Image Super-Resolution

Paper
Add Code

Negative Pre-aware for Noisy Cross-modal Matching

1 code implementation • 10 Dec 2023 • Xu Zhang, Hao Li, Mang Ye

Since clean samples are easier distinguished by GMM with increasing noise, the memory bank can still maintain high quality at a high noise ratio.

Image-text matching Image-to-Text Retrieval +2

Paper
Code

The performance of multiple language models in identifying offensive language on social media

no code implementations • 10 Dec 2023 • Hao Li, Brandon Bennett

Text classification is an important topic in the field of natural language processing.

Information Retrieval Retrieval +2

Paper
Add Code

VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment

no code implementations • 7 Dec 2023 • Phong Tran, Egor Zakharov, Long-Nhat Ho, Anh Tuan Tran, Liwen Hu, Hao Li

We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions.

Disentanglement Self-Supervised Learning

Paper
Add Code

MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator

1 code implementation • 7 Dec 2023 • Xiao-Yin Liu, Xiao-Hu Zhou, Guotao Li, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou

This method trades off performance and robustness via introducing the robust Bellman operator into the algorithm.

Offline RL reinforcement-learning +1

Paper
Code

FreestyleRet: Retrieving Images from Style-Diversified Queries

1 code implementation • 5 Dec 2023 • Hao Li, Curise Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan

In this paper, we propose the Style-Diversified Query-Based Image Retrieval task, which enables retrieval based on various query styles.

Image Retrieval Retrieval

Paper
Code

Sketch Input Method Editor: A Comprehensive Dataset and Methodology for Systematic Input Recognition

1 code implementation • 30 Nov 2023 • Guangming Zhu, Siyuan Wang, Qing Cheng, Kelong Wu, Hao Li, Liang Zhang

With the recent surge in the use of touchscreen devices, free-hand sketching has emerged as a promising modality for human-computer interaction.

Class Incremental Learning Domain Adaptation +2

Paper
Code

Novel OCT mosaicking pipeline with Feature- and Pixel-based registration

no code implementations • 21 Nov 2023 • Jiacheng Wang, Hao Li, Dewei Hu, Yuankai K. Tao, Ipek Oguz

High-resolution Optical Coherence Tomography (OCT) images are crucial for ophthalmology studies but are limited by their relatively narrow field of view (FoV).

Computational Efficiency

Paper
Add Code

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

no code implementations • 20 Nov 2023 • Hao Li, Dingwen Zhang, Yalun Dai, Nian Liu, Lechao Cheng, Jingfeng Li, Jingdong Wang, Junwei Han

Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular.

Instance Segmentation Scene Understanding +2

Paper
Add Code

Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks

1 code implementation • 20 Nov 2023 • Ling Luo, Jinzhong Ning, Yingwen Zhao, Zhijun Wang, Zeyuan Ding, Peng Chen, Weiru Fu, Qinyu Han, Guangtao Xu, Yunzhi Qiu, Dinghao Pan, Jiru Li, Hao Li, Wenduo Feng, Senbo Tu, Yuqi Liu, Zhihao Yang, Jian Wang, Yuanyuan Sun, Hongfei Lin

The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking.

Language Modelling Large Language Model +6

115

Paper
Code

Assessing Test-time Variability for Interactive 3D Medical Image Segmentation with Diverse Point Prompts

1 code implementation • 13 Nov 2023 • Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

In this paper, we assess the test-time variability for interactive medical image segmentation with diverse point prompts.

Image Segmentation Interactive Segmentation +4

Paper
Code

SpectralGPT: Spectral Remote Sensing Foundation Model

no code implementations • 13 Nov 2023 • Danfeng Hong, Bing Zhang, Xuyang Li, YuXuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, Antonio Plaza, Paolo Gamba, Jon Atli Benediktsson, Jocelyn Chanussot

The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner.

Change Detection Representation Learning +3

Paper
Add Code

InfMLLM: A Unified Framework for Visual-Language Tasks

2 code implementations • 12 Nov 2023 • Qiang Zhou, Zhibin Wang, Wei Chu, Yinghui Xu, Hao Li, Yuan Qi

Our experiments demonstrate that preserving the positional information of visual embeddings through the pool-adapter is particularly beneficial for tasks like visual grounding.

Ranked #62 on Visual Question Answering on MM-Vet

Image Captioning Instruction Following +3

Paper
Code

Machine Learning Parameterization of the Multi-scale Kain-Fritsch (MSKF) Convection Scheme

no code implementations • 7 Nov 2023 • Xiaohui Zhong, Xing Yu, Hao Li

The Weather Research and Forecast (WRF) model is used to generate training and testing data over South China at a horizontal resolution of 5 km.

Paper
Add Code

Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models

1 code implementation • 30 Oct 2023 • Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

To address prevalent issues in medical imaging, such as data acquisition challenges and label availability, transfer learning from natural to medical image domains serves as a viable strategy to produce reliable segmentation results.

Image Segmentation Medical Image Segmentation +4

Paper
Code

CROP: Conservative Reward for Model-based Offline Policy Optimization

1 code implementation • 26 Oct 2023 • Hao Li, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Xiao-Yin Liu, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Bo-Xian Yao, Zeng-Guang Hou

Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions.

D4RL Offline RL +1

Paper
Code

FuXi-Extreme: Improving extreme rainfall and wind forecasts with diffusion model

no code implementations • 25 Oct 2023 • Xiaohui Zhong, Lei Chen, Jun Liu, Chensen Lin, Yuan Qi, Hao Li

State-of-the-art ML-based weather forecast models, such as FuXi, have demonstrated superior statistical forecast performance in comparison to the high-resolution forecasts (HRES) of the European Centre for Medium-Range Weather Forecasts (ECMWF).

Denoising Weather Forecasting

Paper
Add Code

Facial Data Minimization: Shallow Model as Your Privacy Filter

no code implementations • 24 Oct 2023 • Yuwen Pu, Jiahao Chen, JiaYu Pan, Hao Li, Diqun Yan, Xuhong Zhang, Shouling Ji

Face recognition service has been used in many fields and brings much convenience to people.

Attribute Face Recognition +2

Paper
Add Code

Unpaired MRI Super Resolution with Contrastive Learning

no code implementations • 24 Oct 2023 • Hao Li, Quanwei Liu, Jianan Liu, Xiling Liu, Yanni Dong, Tao Huang, Zhihan Lv

To this end, we propose an unpaired MRI SR approach that employs contrastive learning to enhance SR performance with limited HR training data.

Contrastive Learning Image Super-Resolution

Paper
Add Code

On Generative Agents in Recommendation

1 code implementation • 16 Oct 2023 • An Zhang, Leheng Sheng, Yuxin Chen, Hao Li, Yang Deng, Xiang Wang, Tat-Seng Chua

Recommender systems are the cornerstone of today's information dissemination, yet a disconnect between offline metrics and online performance greatly hinders their development.

Collaborative Filtering Movie Recommendation +1

173

Paper
Code

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

1 code implementation • NeurIPS 2023 • Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Ranked #16 on Video Retrieval on MSVD

Image-text matching Image-to-Text Retrieval +6

Paper
Code

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

no code implementations • 27 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Yi Zhao, Ying Zhang, Longbiao Wang, Jianwu Dang

To address these issues, we propose a minimally-supervised high-fidelity speech synthesis method, where all modules are constructed based on the diffusion models.

Speech Synthesis Voice Cloning

Paper
Add Code

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

1 code implementation • ICCV 2023 • Jiawei Yao, Chuming Li, Keqiang Sun, Yingjie Cai, Hao Li, Wanli Ouyang, Hongsheng Li

Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.

Ranked #1 on 3D Semantic Scene Completion from a single RGB image on NYUv2

3D Semantic Scene Completion from a single 2D image 3D Semantic Scene Completion from a single RGB image

Paper
Code

Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

no code implementations • 26 Sep 2023 • Danfeng Hong, Bing Zhang, Hao Li, YuXuan Li, Jing Yao, Chenyu Li, Martin Werner, Jocelyn Chanussot, Alexander Zipf, Xiao Xiang Zhu

Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e. g., single cities or regions).

Domain Adaptation Segmentation +1

Paper
Add Code

CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

1 code implementation • 19 Sep 2023 • Chengyan Wang, Jun Lyu, Shuo Wang, Chen Qin, Kunyuan Guo, Xinyu Zhang, Xiaotong Yu, Yan Li, Fanwen Wang, Jianhua Jin, Zhang Shi, Ziqiang Xu, Yapeng Tian, Sha Hua, Zhensen Chen, Meng Liu, Mengting Sun, Xutong Kuang, Kang Wang, Haoran Wang, Hao Li, Yinghua Chu, Guang Yang, Wenjia Bai, Xiahai Zhuang, He Wang, Jing Qin, Xiaobo Qu

However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images.

Image Reconstruction

Paper
Code

DOMAIN: MilDly COnservative Model-BAsed OfflINe Reinforcement Learning

no code implementations • 16 Sep 2023 • Xiao-Yin Liu, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Zeng-Guang Hou

However, uncertainty estimation is unreliable and leads to poor performance in certain scenarios, and the previous methods ignore differences between the model data, which brings great conservatism.

D4RL Model-based Reinforcement Learning +3

Paper
Add Code

Implicit Neural Representation for MRI Parallel Imaging Reconstruction

no code implementations • 12 Sep 2023 • Hao Li, Yusheng Zhou, Jianan Liu, Xiling Liu, Tao Huang, Zhihan Lv, Weidong Cai

Our approach represents reconstructed fully-sampled images as functions of voxel coordinates and prior feature vectors from undersampled images, addressing the generalization challenges of INR.

MRI Reconstruction

Paper
Add Code

Research on Damage Analysis of Key Parts of UAV Flight Control System

no code implementations • 7 Sep 2023 • Tianshun Li, Huaimin Chen, Ben Xiao, Hao Li, Shiyu Hao, Di Hai, Xuetong Wang

A set of hardware in the loop simulation methods based on the UAV model is proposed to create fault data, which is used to judge the parts where faults happen.

Paper
Add Code

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

no code implementations • 1 Sep 2023 • Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing contrastive learning methods in the audio field focus on extracting global descriptive information for downstream audio classification tasks, making them unsuitable for TTS, VC, and ASR tasks.

Audio Classification Automatic Speech Recognition +5

Paper
Add Code

Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

1 code implementation • 23 Aug 2023 • Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen

In this paper, we introduce a privacy-supporting solution that makes the RGB-trained model applicable in depth domain and utilizes depth data at test time for fall detection.

Domain Adaptation

Paper
Code

False Negative/Positive Control for SAM on Noisy Medical Images

1 code implementation • 20 Aug 2023 • Xing Yao, Han Liu, Dewei Hu, Daiwei Lu, Ange Lou, Hao Li, Ruining Deng, Gabriel Arenas, Baris Oguz, Nadav Schwartz, Brett C Byram, Ipek Oguz

The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy.

Image Segmentation Medical Image Segmentation +2

Paper
Code

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

1 code implementation • ICCV 2023 • Junkai Xu, Liang Peng, Haoran Cheng, Hao Li, Wei Qian, Ke Li, Wenxiao Wang, Deng Cai

To the best of our knowledge, this work is the first to introduce volume rendering for M3D, and demonstrates the potential of implicit reconstruction for image-based 3D perception.

Monocular 3D Object Detection Object +1

Paper
Code

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse

no code implementations • 14 Aug 2023 • Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang

Humans have the ability to reuse previously learned policies to solve new tasks quickly, and reinforcement learning (RL) agents can do the same by transferring knowledge from source policies to a related target task.

Continual Learning Reinforcement Learning (RL)

Paper
Add Code

CATS v2: Hybrid encoders for robust medical segmentation

2 code implementations • 11 Aug 2023 • Hao Li, Han Liu, Dewei Hu, Xing Yao, Jiacheng Wang, Ipek Oguz

We fuse the information from the convolutional encoder and the transformer at the skip connections of different resolutions to form the final segmentation.

Domain Adaptation Image Segmentation +3

Paper
Code

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

1 code implementation • 3 Aug 2023 • Weiyun Wang, Min Shi, Qingyun Li, Wenhai Wang, Zhenhang Huang, Linjie Xing, Zhe Chen, Hao Li, Xizhou Zhu, Zhiguo Cao, Yushi Chen, Tong Lu, Jifeng Dai, Yu Qiao

We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world.

Question Answering Retrieval +1

369

Paper
Code

Reinforcement Learning-based Non-Autoregressive Solver for Traveling Salesman Problems

1 code implementation • 1 Aug 2023 • Yubin Xiao, Di Wang, Boyang Li, Huanhuan Chen, Wei Pang, Xuan Wu, Hao Li, Dong Xu, Yanchun Liang, You Zhou

The Traveling Salesman Problem (TSP) is a well-known combinatorial optimization problem with broad real-world applications.

Combinatorial Optimization reinforcement-learning +2

Paper
Code

XMem++: Production-level Video Segmentation From Few Annotated Frames

1 code implementation • ICCV 2023 • Maksym Bekuzarov, Ariana Bermudez, Joon-Young Lee, Hao Li

Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production.

Segmentation Semantic Segmentation +3

131

Paper
Code

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

no code implementations • 28 Jul 2023 • Chunyu Qiang, Hao Li, Hao Ni, He Qu, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

However, existing methods suffer from three problems: the high dimensionality and waveform distortion of discrete speech representations, the prosodic averaging problem caused by the duration prediction model in non-autoregressive frameworks, and the information redundancy and dimension explosion problems of existing semantic encoding methods.

Language Modelling Speech Synthesis

Paper
Add Code

COLosSAL: A Benchmark for Cold-start Active Learning for 3D Medical Image Segmentation

1 code implementation • 22 Jul 2023 • Han Liu, Hao Li, Xing Yao, Yubo Fan, Dewei Hu, Benoit Dawant, Vishwesh Nath, Zhoubing Xu, Ipek Oguz

Cold-start AL is highly relevant in many practical scenarios but has been under-explored, especially for 3D medical segmentation tasks requiring substantial annotation effort.

Active Learning Image Segmentation +3

Paper
Code

PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records

1 code implementation • 5 Jul 2023 • Viktor Schlegel, Hao Li, Yuping Wu, Anand Subramanian, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Daniel Beck, Xiaojun Zeng, Riza Theresa Batista-Navarro, Stefan Winkler, Goran Nenadic

This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records.

Data Augmentation Language Modelling

Paper
Code

Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation

1 code implementation • 5 Jul 2023 • Hao Li, Zhendong Yuan, Gabriel Dax, Gefei Kong, Hongchao Fan, Alexander Zipf, Martin Werner

In this work, we propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OSM data to generate low-cost and open-source 3D city modeling in LoD1.

object-detection Object Detection +1

Paper
Code

VesselMorph: Domain-Generalized Retinal Vessel Segmentation via Shape-Aware Representation

no code implementations • 1 Jul 2023 • Dewei Hu, Hao Li, Han Liu, Xing Yao, Jiacheng Wang, Ipek Oguz

We map the intensity image and the tensor field to a latent space for feature extraction.

Retinal Vessel Segmentation

Paper
Add Code

FuXi: A cascade machine learning forecasting system for 15-day global weather forecast

2 code implementations • 22 Jun 2023 • Lei Chen, Xiaohui Zhong, Feng Zhang, Yuan Cheng, Yinghui Xu, Yuan Qi, Hao Li

Over the past few years, due to the rapid development of machine learning (ML) models for weather forecasting, state-of-the-art ML models have shown superior performance compared to the European Centre for Medium-Range Weather Forecasts (ECMWF)'s high-resolution forecast (HRES) in 10-day forecasts at a spatial resolution of 0. 25 degree.

Weather Forecasting

Paper
Code

WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation

no code implementations • 19 Jun 2023 • Zesen Cheng, Peng Jin, Hao Li, Kehan Li, Siheng Li, Xiangyang Ji, Chang Liu, Jie Chen

Bottom-up methods are mainly perturbed by Inferior Positive (IP) errors due to the lack of prior object information.

Image Segmentation Referring Expression Segmentation +1

Paper
Add Code

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

1 code implementation • 8 Jun 2023 • Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu

In each denoising step, our method first decodes pixels from previous VQ tokens, then generates new VQ tokens from the decoded pixels.

Denoising Representation Learning

Paper
Code

M$^3$Fair: Mitigating Bias in Healthcare Data through Multi-Level and Multi-Sensitive-Attribute Reweighting Method

1 code implementation • 7 Jun 2023 • Yinghao Zhu, Jingkun An, Enshen Zhou, Lu An, Junyi Gao, Hao Li, Haoran Feng, Bo Hou, Wen Tang, Chengwei Pan, Liantao Ma

In healthcare AI, these attributes can play a significant role in determining the quality of care that individuals receive.

Attribute Fairness

Paper
Code

PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

1 code implementation • 5 Jun 2023 • Hao Li, Yuping Wu, Viktor Schlegel, Riza Batista-Navarro, Thanh-Tung Nguyen, Abhinav Ramesh Kashyap, Xiaojun Zeng, Daniel Beck, Stefan Winkler, Goran Nenadic

Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers.

Data Augmentation

Paper
Code

Sonicverse: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear

1 code implementation • 1 Jun 2023 • Ruohan Gao, Hao Li, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio Savarese, Li Fei-Fei, Jiajun Wu

We introduce Sonicverse, a multisensory simulation platform with integrated audio-visual simulation for training household agents that can both see and hear.

Multi-Task Learning Visual Navigation

Paper
Code

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

no code implementations • CVPR 2023 • Ruohan Gao, Yiming Dou, Hao Li, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, Jiajun Wu

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch.

Benchmarking Object +1

Paper
Add Code

Do You Hear The People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation

no code implementations • 25 May 2023 • Hao Li, Viktor Schlegel, Riza Batista-Navarro, Goran Nenadic

Furthermore, evaluating key points is crucial in ensuring that the automatically generated summaries are useful.

Sentence

Paper
Add Code

OVO: Open-Vocabulary Occupancy

1 code implementation • 25 May 2023 • Zhiyu Tan, ZiChao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, Hao Li

Semantic occupancy prediction aims to infer dense geometry and semantics of surroundings for an autonomous agent to operate safely in the 3D environment.

Knowledge Distillation

Paper
Code

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

4 code implementations • 20 May 2023 • Peng Jin, Hao Li, Zesen Cheng, Jinfa Huang, Zhennan Wang, Li Yuan, Chang Liu, Jie Chen

In this paper, we propose the Disentangled Conceptualization and Set-to-set Alignment (DiCoSA) to simulate the conceptualizing and reasoning process of human beings.

Retrieval Video Retrieval

Paper
Code

TG-VQA: Ternary Game of Video Question Answering

no code implementations • 17 May 2023 • Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, Zhennan Wang, Chang Liu, Jie Chen

Video question answering aims at answering a question about the video content by reasoning the alignment semantics within them.

Contrastive Learning Question Answering +2

Paper
Add Code

Correcting for Interference in Experiments: A Case Study at Douyin

no code implementations • 4 May 2023 • Vivek F. Farias, Hao Li, Tianyi Peng, Xinyuyang Ren, Huawei Zhang, Andrew Zheng

We formalize the problem of inference in such experiments as one of policy evaluation.

Uncertainty Quantification

Paper
Add Code

COSST: Multi-organ Segmentation with Partially Labeled Datasets Using Comprehensive Supervisions and Self-training

no code implementations • 27 Apr 2023 • Han Liu, Zhoubing Xu, Riqiang Gao, Hao Li, Jianing Wang, Guillaume Chabin, Ipek Oguz, Sasa Grbic

We revisit the problem from a perspective of partial label supervision signals and identify two signals derived from ground truth and one from pseudo labels.

Computed Tomography (CT) Medical Image Segmentation +5

Paper
Add Code

Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection

no code implementations • 11 Apr 2023 • Tianyuan Zhang, Yisong Xiao, Xiaoya Zhang, Hao Li, Lu Wang

Thus, virtual simulation experiments can provide a solution to this challenge.

Adversarial Attack Adversarial Robustness +1

Paper
Add Code

CryoFormer: Continuous Heterogeneous Cryo-EM Reconstruction using Transformer-based Neural Representations

no code implementations • 28 Mar 2023 • Xinhang Liu, Yan Zeng, Yifan Qin, Hao Li, Jiakai Zhang, Lan Xu, Jingyi Yu

Cryo-electron microscopy (cryo-EM) allows for the high-resolution reconstruction of 3D structures of proteins and other biomolecules.

Paper
Add Code

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

2 code implementations • 22 Mar 2023 • Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li

In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold.

Ranked #4 on 6D Pose Estimation using RGB on LineMOD

3D Object Detection 6D Pose Estimation using RGB +1

1,049

Paper
Code

Learning A Sparse Transformer Network for Effective Image Deraining

1 code implementation • CVPR 2023 • Xiang Chen, Hao Li, Mingqiang Li, Jinshan Pan

To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction.

Image Reconstruction Image Restoration +1

219

Paper
Code

Video Action Recognition with Attentive Semantic Units

no code implementations • ICCV 2023 • Yifei Chen, Dapeng Chen, Ruijin Liu, Hao Li, Wei Peng

Supervised by the semantics of action labels, recent works adapt the visual branch of VLMs to learn video representations.

Action Recognition Temporal Action Localization +1

Paper
Add Code

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

4 code implementations • ICCV 2023 • Peng Jin, Hao Li, Zesen Cheng, Kehan Li, Xiangyang Ji, Chang Liu, Li Yuan, Jie Chen

Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i. e., p(candidates|query).

Ranked #15 on Video Retrieval on MSVD

Retrieval Video Retrieval

Paper
Code

SSL^2: Self-Supervised Learning meets Semi-Supervised Learning: Multiple Sclerosis Segmentation in 7T-MRI from large-scale 3T-MRI

no code implementations • 9 Mar 2023 • Jiacheng Wang, Hao Li, Han Liu, Dewei Hu, Daiwei Lu, Keejin Yoon, Kelsey Barter, Francesca Bagnato, Ipek Oguz

A potential solution is to leverage the information available in large public datasets in conjunction with a target dataset which only has limited labeled data.

Lesion Segmentation Segmentation +1

Paper
Add Code

TwERC: High Performance Ensembled Candidate Generation for Ads Recommendation at Twitter

no code implementations • 27 Feb 2023 • Vanessa Cai, Pradeep Prabakar, Manuel Serrano Rebuelta, Lucas Rosen, Federico Monti, Katarzyna Janocha, Tomo Lazovich, Jeetu Raj, Yedendra Shrinivasan, Hao Li, Thomas Markovich

We focus on the candidate generation phase of a large-scale ads recommendation problem in this paper, and present a machine learning first heterogeneous re-architecture of this stage which we term TwERC.

Recommendation Systems Vocal Bursts Intensity Prediction

Paper
Add Code

An Adaptive Plug-and-Play Network for Few-Shot Learning

no code implementations • 18 Feb 2023 • Hao Li, Li Li, Yunmeng Huang, Ning li, Yongtao Zhang

Few-shot learning (FSL) requires a model to classify new samples after learning from only a few samples.

Few-Shot Learning

Paper
Add Code

Boosting Low-Data Instance Segmentation by Unsupervised Pre-training with Saliency Prompt

no code implementations • CVPR 2023 • Hao Li, Dingwen Zhang, Nian Liu, Lechao Cheng, Yalun Dai, Chao Zhang, Xinggang Wang, Junwei Han

Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels.

Instance Segmentation Semantic Segmentation +1

Paper
Add Code

UNAEN: Unsupervised Abnormality Extraction Network for MRI Motion Artifact Reduction

no code implementations • 4 Jan 2023 • Yusheng Zhou, Hao Li, Jianan Liu, Zhengmin Kong, Tao Huang, Euijoon Ahn, Zhihan Lv, Jinman Kim, David Dagan Feng

Our results substantiate the potential of UNAEN as a promising solution applicable in real-world clinical environments, with the capability to enhance diagnostic accuracy and facilitate image-guided therapies.

Paper
Add Code

OccluMix: Towards De-Occlusion Virtual Try-on by Semantically-Guided Mixup

2 code implementations • 3 Jan 2023 • Zhijing Yang, Junyang Chen, Yukai Shi, Hao Li, Tianshui Chen, Liang Lin

Image Virtual try-on aims at replacing the cloth on a personal image with a garment image (in-shop clothes), which has attracted increasing attention from the multimedia and computer vision communities.

Semantic Parsing Virtual Try-on

Paper
Code

Guided Recommendation for Model Fine-Tuning

no code implementations • CVPR 2023 • Hao Li, Charless Fowlkes, Hao Yang, Onkar Dabeer, Zhuowen Tu, Stefano Soatto

With thousands of historical training jobs, a recommendation system can be learned to predict the model selection score given the features of the dataset and the model as input.

Model Selection Transfer Learning

Paper
Add Code

StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis

1 code implementation • CVPR 2023 • Hao Li, Xianxu Hou, Zepeng Huang, Linlin Shen

As cycle-like losses are designed to measure the L_2 distances between the output of Gene Decoder and image encoder, and that between the output of LGE and IGE, only face images are required to train our framework, i. e. no paired kinship face data is required.

Kinship face generation Kinship Verification

Paper
Code

Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds

no code implementations • ICCV 2023 • Yu Pei, Xian Zhao, Hao Li, Jingyuan Ma, Jingwei Zhang, ShiLiang Pu

Attributed to the unstructured and sparse nature of point clouds, the transformer shows greater potential in point clouds data processing.

3D Object Detection Object +1

Paper
Add Code

Biomedical image analysis competitions: The state of current participation practice

no code implementations • 16 Dec 2022 • Matthias Eisenmann, Annika Reinke, Vivienn Weru, Minu Dietlinde Tizabi, Fabian Isensee, Tim J. Adler, Patrick Godau, Veronika Cheplygina, Michal Kozubek, Sharib Ali, Anubha Gupta, Jan Kybic, Alison Noble, Carlos Ortiz de Solórzano, Samiksha Pachade, Caroline Petitjean, Daniel Sage, Donglai Wei, Elizabeth Wilden, Deepak Alapatt, Vincent Andrearczyk, Ujjwal Baid, Spyridon Bakas, Niranjan Balu, Sophia Bano, Vivek Singh Bawa, Jorge Bernal, Sebastian Bodenstedt, Alessandro Casella, Jinwook Choi, Olivier Commowick, Marie Daum, Adrien Depeursinge, Reuben Dorent, Jan Egger, Hannah Eichhorn, Sandy Engelhardt, Melanie Ganz, Gabriel Girard, Lasse Hansen, Mattias Heinrich, Nicholas Heller, Alessa Hering, Arnaud Huaulmé, Hyunjeong Kim, Bennett Landman, Hongwei Bran Li, Jianning Li, Jun Ma, Anne Martel, Carlos Martín-Isla, Bjoern Menze, Chinedu Innocent Nwoye, Valentin Oreiller, Nicolas Padoy, Sarthak Pati, Kelly Payette, Carole Sudre, Kimberlin Van Wijnen, Armine Vardazaryan, Tom Vercauteren, Martin Wagner, Chuanbo Wang, Moi Hoon Yap, Zeyun Yu, Chun Yuan, Maximilian Zenk, Aneeq Zia, David Zimmerer, Rina Bao, Chanyeol Choi, Andrew Cohen, Oleh Dzyubachyk, Adrian Galdran, Tianyuan Gan, Tianqi Guo, Pradyumna Gupta, Mahmood Haithami, Edward Ho, Ikbeom Jang, Zhili Li, Zhengbo Luo, Filip Lux, Sokratis Makrogiannis, Dominik Müller, Young-tack Oh, Subeen Pang, Constantin Pape, Gorkem Polat, Charlotte Rosalie Reed, Kanghyun Ryu, Tim Scherr, Vajira Thambawita, Haoyu Wang, Xinliang Wang, Kele Xu, Hung Yeh, Doyeob Yeo, Yixuan Yuan, Yan Zeng, Xin Zhao, Julian Abbing, Jannes Adam, Nagesh Adluru, Niklas Agethen, Salman Ahmed, Yasmina Al Khalil, Mireia Alenyà, Esa Alhoniemi, Chengyang An, Talha Anwar, Tewodros Weldebirhan Arega, Netanell Avisdris, Dogu Baran Aydogan, Yingbin Bai, Maria Baldeon Calisto, Berke Doga Basaran, Marcel Beetz, Cheng Bian, Hao Bian, Kevin Blansit, Louise Bloch, Robert Bohnsack, Sara Bosticardo, Jack Breen, Mikael Brudfors, Raphael Brüngel, Mariano Cabezas, Alberto Cacciola, Zhiwei Chen, Yucong Chen, Daniel Tianming Chen, Minjeong Cho, Min-Kook Choi, Chuantao Xie Chuantao Xie, Dana Cobzas, Julien Cohen-Adad, Jorge Corral Acero, Sujit Kumar Das, Marcela de Oliveira, Hanqiu Deng, Guiming Dong, Lars Doorenbos, Cory Efird, Sergio Escalera, Di Fan, Mehdi Fatan Serj, Alexandre Fenneteau, Lucas Fidon, Patryk Filipiak, René Finzel, Nuno R. Freitas, Christoph M. Friedrich, Mitchell Fulton, Finn Gaida, Francesco Galati, Christoforos Galazis, Chang Hee Gan, Zheyao Gao, Shengbo Gao, Matej Gazda, Beerend Gerats, Neil Getty, Adam Gibicar, Ryan Gifford, Sajan Gohil, Maria Grammatikopoulou, Daniel Grzech, Orhun Güley, Timo Günnemann, Chunxu Guo, Sylvain Guy, Heonjin Ha, Luyi Han, Il Song Han, Ali Hatamizadeh, Tian He, Jimin Heo, Sebastian Hitziger, SeulGi Hong, Seungbum Hong, Rian Huang, Ziyan Huang, Markus Huellebrand, Stephan Huschauer, Mustaffa Hussain, Tomoo Inubushi, Ece Isik Polat, Mojtaba Jafaritadi, SeongHun Jeong, Bailiang Jian, Yuanhong Jiang, Zhifan Jiang, Yueming Jin, Smriti Joshi, Abdolrahim Kadkhodamohammadi, Reda Abdellah Kamraoui, Inha Kang, Junghwa Kang, Davood Karimi, April Khademi, Muhammad Irfan Khan, Suleiman A. Khan, Rishab Khantwal, Kwang-Ju Kim, Timothy Kline, Satoshi Kondo, Elina Kontio, Adrian Krenzer, Artem Kroviakov, Hugo Kuijf, Satyadwyoom Kumar, Francesco La Rosa, Abhi Lad, Doohee Lee, Minho Lee, Chiara Lena, Hao Li, Ling Li, Xingyu Li, Fuyuan Liao, Kuanlun Liao, Arlindo Limede Oliveira, Chaonan Lin, Shan Lin, Akis Linardos, Marius George Linguraru, Han Liu, Tao Liu, Di Liu, Yanling Liu, João Lourenço-Silva, Jingpei Lu, Jiangshan Lu, Imanol Luengo, Christina B. Lund, Huan Minh Luu, Yi Lv, Uzay Macar, Leon Maechler, Sina Mansour L., Kenji Marshall, Moona Mazher, Richard McKinley, Alfonso Medela, Felix Meissen, Mingyuan Meng, Dylan Miller, Seyed Hossein Mirjahanmardi, Arnab Mishra, Samir Mitha, Hassan Mohy-ud-Din, Tony Chi Wing Mok, Gowtham Krishnan Murugesan, Enamundram Naga Karthik, Sahil Nalawade, Jakub Nalepa, Mohamed Naser, Ramin Nateghi, Hammad Naveed, Quang-Minh Nguyen, Cuong Nguyen Quoc, Brennan Nichyporuk, Bruno Oliveira, David Owen, Jimut Bahan Pal, Junwen Pan, Wentao Pan, Winnie Pang, Bogyu Park, Vivek Pawar, Kamlesh Pawar, Michael Peven, Lena Philipp, Tomasz Pieciak, Szymon Plotka, Marcel Plutat, Fattaneh Pourakpour, Domen Preložnik, Kumaradevan Punithakumar, Abdul Qayyum, Sandro Queirós, Arman Rahmim, Salar Razavi, Jintao Ren, Mina Rezaei, Jonathan Adam Rico, ZunHyan Rieu, Markus Rink, Johannes Roth, Yusely Ruiz-Gonzalez, Numan Saeed, Anindo Saha, Mostafa Salem, Ricardo Sanchez-Matilla, Kurt Schilling, Wei Shao, Zhiqiang Shen, Ruize Shi, Pengcheng Shi, Daniel Sobotka, Théodore Soulier, Bella Specktor Fadida, Danail Stoyanov, Timothy Sum Hon Mun, Xiaowu Sun, Rong Tao, Franz Thaler, Antoine Théberge, Felix Thielke, Helena Torres, Kareem A. Wahid, Jiacheng Wang, Yifei Wang, Wei Wang, Xiong Wang, Jianhui Wen, Ning Wen, Marek Wodzinski, Ye Wu, Fangfang Xia, Tianqi Xiang, Chen Xiaofei, Lizhan Xu, Tingting Xue, Yuxuan Yang, Lin Yang, Kai Yao, Huifeng Yao, Amirsaeed Yazdani, Michael Yip, Hwanseung Yoo, Fereshteh Yousefirizi, Shunkai Yu, Lei Yu, Jonathan Zamora, Ramy Ashraf Zeineldin, Dewen Zeng, Jianpeng Zhang, Bokai Zhang, Jiapeng Zhang, Fan Zhang, Huahong Zhang, Zhongchen Zhao, Zixuan Zhao, Jiachen Zhao, Can Zhao, Qingshuo Zheng, Yuheng Zhi, Ziqi Zhou, Baosheng Zou, Klaus Maier-Hein, Paul F. Jäger, Annette Kopp-Schneider, Lena Maier-Hein

Of these, 84% were based on standard architectures.

Benchmarking

Paper
Add Code

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

no code implementations • CVPR 2023 • Sicheng Li, Hao Li, Yue Wang, Yiyi Liao, Lu Yu

Neural Radiance Fields (NeRF) have demonstrated superior novel view synthesis performance but are slow at rendering.

Novel View Synthesis

Paper
Add Code

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

no code implementations • 7 Dec 2022 • Hao Li, Yizhi Zhang, Junzhe Zhu, Shaoxiong Wang, Michelle A Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao, Jiajun Wu

Humans use all of their senses to accomplish different tasks in everyday activities.

Decision Making

Paper
Add Code

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

2 code implementations • NeurIPS 2022 2022 • Hao Li, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Haonan Zhang, Gongfu Li

To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval.

Image-text matching Image-to-Text Retrieval +1

Paper
Code

Entropy-Driven Mixed-Precision Quantization for Deep Network Design

1 code implementation • Conference on Neural Information Processing Systems 2022 • Zhenhong Sun, Ce Ge, Junyan Wang, Ming Lin, Hesen Chen, Hao Li, Xiuyu Sun

Deploying deep convolutional neural networks on Internet-of-Things (IoT) devices is challenging due to the limited computational resources, such as limited SRAM memory and Flash storage.

Face Detection Hardware Aware Neural Architecture Search +3

344

Paper
Code

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

2,303

Paper
Code

Bayesian Layer Graph Convolutioanl Network for Hyperspetral Image Classification

no code implementations • 14 Nov 2022 • Mingyang Zhang, Ziqi Di, Maoguo Gong, Yue Wu, Hao Li, Xiangming Jiang

In recent years, research on hyperspectral image (HSI) classification has continuous progress on introducing deep network models, and recently the graph convolutional network (GCN) based models have shown impressive performance.

Classification Generative Adversarial Network +1

Paper
Add Code

Detecting Line Segments in Motion-blurred Images with Events

1 code implementation • 14 Nov 2022 • Huai Yu, Hao Li, Wen Yang, Lei Yu, Gui-Song Xia

To robustly detect line segments over motion blurs, we propose to leverage the complementary information of images and events.

3D Reconstruction Line Segment Detection +1

Paper
Code

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

1 code implementation • NIPS 2022 • Zhenyu Wang, Hao Luo, Pichao Wang, Feng Ding, Fan Wang, Hao Li

Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem.

Paper
Code

Towards Consistency and Complementarity: A Multiview Graph Information Bottleneck Approach

1 code implementation • 11 Oct 2022 • Xiaolong Fan, Maoguo Gong, Yue Wu, Mingyang Zhang, Hao Li, Xiangming Jiang

In this paper, we propose a novel Multiview Variational Graph Information Bottleneck (MVGIB) principle to maximize the agreement for common representations and the disagreement for view-specific representations.

Paper
Code

Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering

no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen

Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.

Image Captioning Optical Character Recognition (OCR) +2

Paper
Add Code

MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

no code implementations • 7 Sep 2022 • Qiang Zhou, Chaohui Yu, Hao Luo, Zhibin Wang, Hao Li

Specifically, MimCo takes a pre-trained contrastive learning model as the teacher model and is pre-trained with two types of learning targets: patch-level and image-level reconstruction losses.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Cats: Complementary CNN and Transformer Encoders for Segmentation

no code implementations • 24 Aug 2022 • Hao Li, Dewei Hu, Han Liu, Jiacheng Wang, Ipek Oguz

We fuse the information from the convolutional encoder and the transformer, and pass it to the decoder to obtain the results.

3D Medical Imaging Segmentation Image Segmentation +1

Paper
Add Code

Region-Based Evidential Deep Learning to Quantify Uncertainty and Improve Robustness of Brain Tumor Segmentation

no code implementations • 11 Aug 2022 • Hao Li, Yang Nan, Javier Del Ser, Guang Yang

Despite recent advances in the accuracy of brain tumor segmentation, the results still suffer from low reliability and robustness.

Brain Tumor Segmentation Image Classification +2

Paper
Add Code

SBPF: Sensitiveness Based Pruning Framework For Convolutional Neural Network On Image Classification

no code implementations • 9 Aug 2022 • Yiheng Lu, Maoguo Gong, Wei Zhao, Kaiyuan Feng, Hao Li

Therefore, we propose a sensitiveness based method to evaluate the importance of each layer from the perspective of inference accuracy by adding extra damage for the original model.

Image Classification

Paper
Add Code

Semantic Data Augmentation based Distance Metric Learning for Domain Generalization

no code implementations • 2 Aug 2022 • Mengzhu Wang, Jianlong Yuan, Qi Qian, Zhibin Wang, Hao Li

Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization.

Data Augmentation Domain Generalization +1

Paper
Add Code

DnSwin: Toward Real-World Denoising via Continuous Wavelet Sliding-Transformer

1 code implementation • 28 Jul 2022 • Hao Li, Zhijing Yang, Xiaobin Hong, Ziying Zhao, Junyang Chen, Yukai Shi, Jinshan Pan

Real-world image denoising is a practical image restoration problem that aims to obtain clean images from in-the-wild noisy inputs.

Image Denoising Image Restoration

Paper
Code

Criteria Comparative Learning for Real-scene Image Super-Resolution

2 code implementations • 26 Jul 2022 • Yukai Shi, Hao Li, Sen Zhang, Zhijing Yang, Xiao Wang

Inspired by the observation that the contrastive relationship could also exist between the criteria, in this work, we propose a novel training paradigm for RealSR, named Criteria Comparative Learning (Cria-CL), by developing contrastive losses defined on criteria instead of image patches.

Contrastive Learning Image Super-Resolution +1

Paper
Code

Large-Kernel Attention for 3D Medical Image Segmentation

no code implementations • 19 Jul 2022 • Hao Li, Yang Nan, Javier Del Ser, Guang Yang

The performance improvement due to the proposed LK attention module was also statistically validated.

Computed Tomography (CT) Image Segmentation +4

Paper
Add Code

Cross Vision-RF Gait Re-identification with Low-cost RGB-D Cameras and mmWave Radars

no code implementations • 16 Jul 2022 • Dongjiang Cao, Ruofeng Liu, Hao Li, Shuai Wang, Wenchao Jiang, Chris Xiaoxuan Lu

Human identification is a key requirement for many applications in everyday life, such as personalized services, automatic surveillance, continuous authentication, and contact tracing during pandemics, etc.

Metric Learning Person Re-Identification

Paper
Add Code

Dynamic Gradient Reactivation for Backward Compatible Person Re-identification

no code implementations • 12 Jul 2022 • Xiao Pan, Hao Luo, Weihua Chen, Fan Wang, Hao Li, Wei Jiang, Jianming Zhang, Jianyang Gu, Peike Li

To address this issue, we propose the Ranking-based Backward Compatible Learning (RBCL), which directly optimizes the ranking metric between new features and old features.

Person Re-Identification Retrieval

Paper
Add Code

Human Treelike Tubular Structure Segmentation: A Comprehensive Review and Future Perspectives

no code implementations • 12 Jul 2022 • Hao Li, Zeyu Tang, Yang Nan, Guang Yang

Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales.

Computed Tomography (CT)

Paper
Add Code

DLME: Deep Local-flatness Manifold Embedding

2 code implementations • 7 Jul 2022 • Zelin Zang, Siyuan Li, Di wu, Ge Wang, Lei Shang, Baigui Sun, Hao Li, Stan Z. Li

To overcome the underconstrained embedding problem, we design a loss and theoretically demonstrate that it leads to a more suitable embedding based on the local flatness.

Ranked #2 on Image Classification on ImageNet-100

Contrastive Learning Data Augmentation +1

568

Paper
Code

Location reference recognition from texts: A survey and comparison

no code implementations • 4 Jul 2022 • Xuke Hu, Zhiyong Zhou, Hao Li, Yingjie Hu, Fuqiang Gu, Jens Kersten, Hongchao Fan, Friederike Klan

Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing.

Computational Efficiency Information Retrieval +2

Paper
Add Code

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

1 code implementation • 23 Jun 2022 • Tairan Huang, Xu Li, Hao Li, Mingming Sun, Ping Li

As discussed in this paper, under the settings of the off-policy actor critic algorithms, we demonstrate that the critic can bring more expected discounted rewards than or at least equal to the actor.

Reinforcement Learning (RL)

Paper
Code

Real-World Image Super-Resolution by Exclusionary Dual-Learning

1 code implementation • 6 Jun 2022 • Hao Li, Jinghui Qin, Zhijing Yang, Pengxu Wei, Jinshan Pan, Liang Lin, Yukai Shi

Real-world image super-resolution is a practical image restoration problem that aims to obtain high-quality images from in-the-wild input, has recently received considerable attention with regard to its tremendous application potentials.

Image Restoration Image Super-Resolution

Paper
Code

Point-Teaching: Weakly Semi-Supervised Object Detection with Point Annotations

no code implementations • 1 Jun 2022 • Yongtao Ge, Qiang Zhou, Xinlong Wang, Zhibin Wang, Hao Li, Chunhua Shen

Point annotations are considerably more time-efficient than bounding box annotations.

Data Augmentation Multiple Instance Learning +4

Paper
Add Code

Point RCNN: An Angle-Free Framework for Rotated Object Detection

no code implementations • 28 May 2022 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Hao Li

To tackle this problem, we propose a purely angle-free framework for rotated object detection, called Point RCNN, which mainly consists of PointRPN and PointReg.

Object object-detection +1

Paper
Add Code

SwinVRNN: A Data-Driven Ensemble Forecasting Model via Learned Distribution Perturbation

no code implementations • 26 May 2022 • Yuan Hu, Lei Chen, Zhibin Wang, Hao Li

We also compare four categories of perturbation methods for ensemble forecasting, i. e. fixed distribution perturbation, learned distribution perturbation, MC dropout, and multi model ensemble.

Weather Forecasting

Paper
Add Code

An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

no code implementations • 25 May 2022 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang Ji, Antoni B. Chan

With our empirical result obtained from 1, 330 models, we provide the following main observations: 1) ERM combined with data augmentation can achieve state-of-the-art performance if we choose a proper pre-trained model respecting the data property; 2) specialized algorithms further improve the robustness on top of ERM when handling a specific type of distribution shift, e. g., GroupDRO for spurious correlation and CORAL for large-scale out-of-distribution data; 3) Comparing different pre-training modes, architectures and data sizes, we provide novel observations about pre-training on distribution shift, which sheds light on designing or selecting pre-training strategy for different kinds of distribution shifts.

Data Augmentation

Paper
Add Code

Unsupervised Representation Learning for 3D MRI Super Resolution with Degradation Adaptation

no code implementations • 13 May 2022 • Jianan Liu, Hao Li, Tao Huang, Euijoon Ahn, Kang Han, Adeel Razi, Wei Xiang, Jinman Kim, David Dagan Feng

However, the difference in degradation representations between synthetic and authentic LR images suppresses the quality of SR images reconstructed from authentic LR images.

Image Registration Representation Learning +1

Paper
Add Code

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations • 11 May 2022 • Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

117

Paper
Code

Joint learning of object graph and relation graph for visual question answering

no code implementations • 9 May 2022 • Hao Li, Xu Li, Belhal Karimi, Jie Chen, Mingming Sun

Modeling visual question answering(VQA) through scene graphs can significantly improve the reasoning accuracy and interpretability.

Attribute Question Answering +2

Paper
Add Code

Multi-view Point Cloud Registration based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism

no code implementations • 6 May 2022 • Yue Wu, Yibo Liu, Maoguo Gong, Peiran Gong, Hao Li, Zedong Tang, Qiguang Miao, Wenping Ma

The modeling of multi-view point cloud registration as multi-task optimization are twofold.

3D Reconstruction Point Cloud Registration

Paper
Add Code

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

no code implementations • CVPR 2022 • Evonne Ng, Hanbyul Joo, Liwen Hu, Hao Li, Trevor Darrell, Angjoo Kanazawa, Shiry Ginosar

We present a framework for modeling interactional communication in dyadic conversations: given multimodal inputs of a speaker, we autoregressively output multiple possibilities of corresponding listener motion.

Paper
Add Code

Task Adaptive Parameter Sharing for Multi-Task Learning

1 code implementation • CVPR 2022 • Matthew Wallingford, Hao Li, Alessandro Achille, Avinash Ravichandran, Charless Fowlkes, Rahul Bhotika, Stefano Soatto

TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights.

Multi-Task Learning

Paper
Code

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

1 code implementation • CVPR 2022 • Hansheng Chen, Pichao Wang, Fan Wang, Wei Tian, Lu Xiong, Hao Li

The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution.

Ranked #6 on 6D Pose Estimation using RGB on LineMOD

3D Object Detection 6D Pose Estimation using RGB +1

1,049

Paper
Code

PMAL: Open Set Recognition via Robust Prototype Mining

no code implementations • 16 Mar 2022 • Jing Lu, Yunxu Xu, Hao Li, Zhanzhan Cheng, Yi Niu

Accordingly, the embedding space can be better optimized to discriminate therein the predefined classes and between known and unknowns.

Open Set Learning

Paper
Add Code

ModDrop++: A Dynamic Filter Network with Intra-subject Co-training for Multiple Sclerosis Lesion Segmentation with Missing Modalities

1 code implementation • 7 Mar 2022 • Han Liu, Yubo Fan, Hao Li, Jiacheng Wang, Dewei Hu, Can Cui, Ho Hin Lee, Huahong Zhang, Ipek Oguz

Previously, a training strategy termed Modality Dropout (ModDrop) has been applied to MS lesion segmentation to achieve the state-of-the-art performance with missing modality.

Lesion Segmentation

Paper
Code

On Representation Learning with Feedback

1 code implementation • 15 Feb 2022 • Hao Li

This note complements the author's recent paper "Robust representation learning with feedback for single image deraining" by providing heuristically theoretical explanations on the mechanism of representation learning with feedback, namely an essential merit of the works presented in this recent article.

Representation Learning Single Image Deraining

Paper
Code

GiraffeDet: A Heavy-Neck Paradigm for Object Detection

2 code implementations • ICLR 2022 • Yiqi Jiang, Zhiyu Tan, Junyan Wang, Xiuyu Sun, Ming Lin, Hao Li

This heavy-backbone design paradigm is mostly due to the historical legacy when transferring image recognition models to object detection rather than an end-to-end optimized design for object detection.

Object object-detection +1

Paper
Code

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

no code implementations • 21 Jan 2022 • Pichao Wang, Fan Wang, Hao Li

During the KD process, the TCL loss transfers the local structure, exploits the higher order information, and mitigates the misalignment of the heterogeneous output of teacher and student networks.

Knowledge Distillation Transfer Learning

Paper
Add Code

Studying Popular Open Source Machine Learning Libraries and Their Cross-Ecosystem Bindings

1 code implementation • 18 Jan 2022 • Hao Li, Cor-Paul Bezemer

Our study shows that the vast majority of the studied bindings cover only a small portion of the source library releases, and the delay for receiving support for a source library release is large.

BIG-bench Machine Learning

Paper
Code

CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

3 code implementations • 8 Jan 2022 • Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen, Hao Li, Han Liu, Buntheng Ly, Ipek Oguz, Hyungseob Shin, Boris Shirokikh, Zixian Su, Guotai Wang, Jianghao Wu, Yanwu Xu, Kai Yao, Li Zhang, Sebastien Ourselin, Jonathan Shapey, Tom Vercauteren

The aim was to automatically perform unilateral VS and bilateral cochlea segmentation on hrT2 as provided in the testing set (N=137).

Brain Segmentation Domain Adaptation +4

108

Paper
Code

Graph Neural Networks for Double-Strand DNA Breaks Prediction

no code implementations • 4 Jan 2022 • Xu Wang, Huan Zhao, WeiWei Tu, Hao Li, Yu Sun, Xiaochen Bo

Double-strand DNA breaks (DSBs) are a form of DNA damage that can cause abnormal chromosomal rearrangements.

Paper
Add Code

ELSA: Enhanced Local Self-Attention for Vision Transformer

1 code implementation • 23 Dec 2021 • Jingkai Zhou, Pichao Wang, Fan Wang, Qiong Liu, Hao Li, Rong Jin

Self-attention is powerful in modeling long-range dependencies, but it is weak in local finer-level feature learning.

Ranked #46 on Semantic Segmentation on ADE20K val

Image Classification Instance Segmentation +2

113

Paper
Code

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

1 code implementation • 21 Dec 2021 • Shruti Agarwal, Liwen Hu, Evonne Ng, Trevor Darrell, Hao Li, Anna Rohrbach

In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques.

Misinformation

Paper
Code

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

1 code implementation • CVPR 2022 • Benjia Zhou, Pichao Wang, Jun Wan, Yanyan Liang, Fan Wang, Du Zhang, Zhen Lei, Hao Li, Rong Jin

Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors.

Ranked #1 on Hand Gesture Recognition on NVGesture

Hand Gesture Recognition

Paper
Code

On the Dilution of Precision for Time Difference of Arrival with Station Deployment

no code implementations • 10 Dec 2021 • Fengyun Zhang, Hao Li, Yulong Ding, Shuang-Hua Yang, Li Yang

The paper aims to reveal the relationship between the performance of moving object tracking algorithms and the tracking anchors (station) deployment.

Object Tracking TAG

Paper
Add Code

Design and Implementation of Real-Time Localization System (RTLS) based on UWB and TDoA Algorithm

no code implementations • 9 Dec 2021 • Fengyun Zhang, Li Yang, Yuhuan Liu, Yulong Ding, Shuang-Hua Yang, Hao Li

The challenges of indoor localization include inadequate localization accuracy, unreasonable anchor deployment in complex scenarios, lack of stability, and high cost.

Indoor Localization

Paper
Add Code

TransZero: Attribute-guided Transformer for Zero-Shot Learning

1 code implementation • 3 Dec 2021 • Shiming Chen, Ziming Hong, Yang Liu, Guo-Sen Xie, Baigui Sun, Hao Li, Qinmu Peng, Ke Lu, Xinge You

Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected.

Attribute Zero-Shot Learning

Paper
Code

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

1 code implementation • 2 Dec 2021 • Zhaoyuan Yin, Pichao Wang, Fan Wang, Xianzhe Xu, Hanling Zhang, Hao Li, Rong Jin

Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations.

Ranked #2 on Unsupervised Semantic Segmentation on COCO-Stuff-171 (using extra training data)

Segmentation Self-Supervised Learning +1

Paper
Code

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

1 code implementation • CVPR 2022 • Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

250

Paper
Code

3D High-Quality Magnetic Resonance Image Restoration in Clinics Using Deep Learning

no code implementations • 28 Nov 2021 • Hao Li, Jianan Liu

We also analyzed several down-sampling strategies based on the acceleration factor, including multiple combinations of in-plane and through-plane down-sampling, and developed a controllable and quantifiable motion artifact generation method.

Image Restoration Super-Resolution

Paper
Add Code

MAE-DET: Revisiting Maximum Entropy Principle in Zero-Shot NAS for Efficient Object Detection

1 code implementation • 26 Nov 2021 • Zhenhong Sun, Ming Lin, Xiuyu Sun, Zhiyu Tan, Hao Li, Rong Jin

Recent researches attempt to reduce this cost by optimizing the backbone architecture with the help of Neural Architecture Search (NAS).

Ranked #88 on Object Detection on COCO minival

Neural Architecture Search Object +2

344

Paper
Code

Improved Fine-Tuning by Better Leveraging Pre-Training Data

no code implementations • 24 Nov 2021 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong Jin

The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning.

Image Classification Learning Theory

Paper
Add Code

Get Better 1 Pixel PCK: Ladder Scales Correspondence Flow Networks for Remote Sensing Image Matching in Higher Resolution

no code implementations • ICCV 2021 • Weitao Chen, Zhibin Wang, Hao Li

Percentage of image size is often used as the threshold of PCK.

Paper
Add Code

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

2 code implementations • 23 Nov 2021 • Hao Luo, Pichao Wang, Yi Xu, Feng Ding, Yanxin Zhou, Fan Wang, Hao Li, Rong Jin

We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks.

Ranked #1 on Unsupervised Person Re-Identification on Market-1501 (using extra training data)

Self-Supervised Learning Unsupervised Domain Adaptation +1

217

Paper
Code

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

no code implementations • ICCV 2021 • Tianye Li, Shichen Liu, Timo Bolkart, Jiayi Liu, Hao Li, Yajie Zhao

We propose ToFu, Topologically consistent Face from multi-view, a geometry inference framework that can produce topologically consistent meshes across facial identities and expressions using a volumetric representation instead of an explicit underlying 3DMM.

3D Reconstruction

Paper
Add Code

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

2 code implementations • NeurIPS 2021 • Shiming Chen, Guo-Sen Xie, Yang Liu, Qinmu Peng, Baigui Sun, Hao Li, Xinge You, Ling Shao

Specifically, HSVA aligns the semantic and visual domains by adopting a hierarchical two-step adaptation, i. e., structure adaptation and distribution adaptation.

Transfer Learning Zero-Shot Learning

Paper
Code

NAS-Bench-Zero: A Large Scale Dataset for Understanding Zero-Shot Neural Architecture Search

no code implementations • 29 Sep 2021 • Hanlin Chen, Ming Lin, Xiuyu Sun, Hao Li

Based on these new discoveries, we propose i) a novel hybrid zero-shot proxy which outperforms existing ones by a large margin and is transferable among popular search spaces; ii) a new index for better measuring the true performance of ZS-NAS proxies in constrained NAS.

Benchmarking Neural Architecture Search

Paper
Add Code

Unsupervised Domain Adaptation By Optimal Transportation Of Clusters Between Domains

no code implementations • 29 Sep 2021 • Yang Liu, Zhipeng Zhou, Lei Shang, Baigui Sun, Hao Li, Rong Jin

Unsupervised domain adaptation (UDA) aims to transfer the knowledge from a labeled source domain to an unlabeled target domain.

Attribute Clustering +2

Paper
Add Code

Text-based Person Search in Full Images via Semantic-Driven Proposal Generation

1 code implementation • 27 Sep 2021 • Shizhou Zhang, De Cheng, Wenlong Luo, Yinghui Xing, Duo Long, Hao Li, Kai Niu, Guoqiang Liang, Yanning Zhang

Finding target persons in full scene images with a query of text description has important practical applications in intelligent video surveillance. However, different from the real-world scenarios where the bounding boxes are not available, existing text-based person retrieval methods mainly focus on the cross modal matching between the query text descriptions and the gallery of cropped pedestrian images.

Person Search Retrieval +3

Paper
Code

Unsupervised Cross-Modality Domain Adaptation for Segmenting Vestibular Schwannoma and Cochlea with Data Augmentation and Model Ensemble

no code implementations • 24 Sep 2021 • Hao Li, Dewei Hu, Qibang Zhu, Kathleen E. Larson, Huahong Zhang, Ipek Oguz

To overcome this problem, domain adaptation is an effective way to leverage information from source domain to obtain accurate segmentations without requiring manual labels in target domain.

Data Augmentation Domain Adaptation +2

Paper
Add Code

Interpolation variable rate image compression

1 code implementation • 20 Sep 2021 • Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Yichen Qian, Dongyang Li, Hao Li

Compression standards have been used to reduce the cost of image storage and transmission for decades.

Image Compression MS-SSIM +1

Paper
Code

DisUnknown: Distilling Unknown Factors for Disentanglement Learning

1 code implementation • ICCV 2021 • Sitao Xiang, Yuming Gu, Pengda Xiang, Menglei Chai, Hao Li, Yajie Zhao, Mingming He

In this paper, we adopt a general setting where all factors that are hard to label or identify are encapsulated as a single unknown factor.

Disentanglement

Paper
Code

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

2 code implementations • ICLR 2022 • Tongkun Xu, Weihua Chen, Pichao Wang, Fan Wang, Hao Li, Rong Jin

Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.

Ranked #3 on Domain Adaptation on Office-31

Unsupervised Domain Adaptation

309

Paper
Code

Scaled ReLU Matters for Training Vision Transformers

no code implementations • 8 Sep 2021 • Pichao Wang, Xue Wang, Hao Luo, Jingkai Zhou, Zhipeng Zhou, Fan Wang, Hao Li, Rong Jin

In this paper, we further investigate this problem and extend the above conclusion: only early convolutions do not help for stable training, but the scaled ReLU operation in the \textit{convolutional stem} (\textit{conv-stem}) matters.

Paper
Add Code

Dash: Semi-Supervised Learning with Dynamic Thresholding

no code implementations • 1 Sep 2021 • Yi Xu, Lei Shang, Jinxing Ye, Qi Qian, Yu-Feng Li, Baigui Sun, Hao Li, Rong Jin

In this work we develop a simple yet powerful framework, whose key idea is to select a subset of training examples from the unlabeled data when performing existing SSL methods so that only the unlabeled examples with pseudo labels related to the labeled data will be used to train models.

Ranked #1 on Semi-Supervised Image Classification on CIFAR-10, 250 Labels

Semi-Supervised Image Classification

Paper
Add Code

Digging into Uncertainty in Self-supervised Multi-view Stereo

1 code implementation • ICCV 2021 • Hongbin Xu, Zhipeng Zhou, Yali Wang, Wenxiong Kang, Baigui Sun, Hao Li, Yu Qiao

Specially, the limitations can be categorized into two types: ambiguious supervision in foreground and invalid supervision in background.

Image Reconstruction Self-Supervised Learning

Paper
Code

Exploring the Quality of GAN Generated Images for Person Re-Identification

no code implementations • 23 Aug 2021 • Yiqi Jiang, Weihua Chen, Xiuyu Sun, Xiaoyu Shi, Fan Wang, Hao Li

Recently, GAN based method has demonstrated strong effectiveness in generating augmentation data for person re-identification (ReID), on account of its ability to bridge the gap between domains and enrich the data variety in feature space.

Person Re-Identification Unsupervised Domain Adaptation

Paper
Add Code

Fine-Grained AutoAugmentation for Multi-Label Classification

no code implementations • 12 Jul 2021 • Ya Wang, Hesen Chen, Fangyi Zhang, Yaohua Wang, Xiuyu Sun, Ming Lin, Hao Li

Data augmentation is a commonly used approach to improving the generalization of deep learning models.

Classification Data Augmentation +3

Paper
Add Code

A Cloud-Edge-Terminal Collaborative System for Temperature Measurement in COVID-19 Prevention

no code implementations • 11 Jul 2021 • Zheyi Ma, Hao Li, Wen Fang, Qingwen Liu, Bin Zhou, Zhiyong Bu

Then, a mobile detection model based on a multi-task cascaded convolutional network (MTCNN) is proposed to realize face alignment and mask detection on the RGB images.

Face Alignment

Paper
Add Code

LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation

no code implementations • 9 Jul 2021 • Dewei Hu, Can Cui, Hao Li, Kathleen E. Larson, Yuankai K. Tao, Ipek Oguz

We then construct the local intensity fusion encoder (LIFE) to map a given OCT-A volume and its LIF counterpart to a shared latent space.

Retinal Vessel Segmentation Segmentation

Paper
Add Code

Graph Convolution for Re-ranking in Person Re-identification

1 code implementation • 5 Jul 2021 • Yuqi Zhang, Qian Qi, Chong Liu, Weihua Chen, Fan Wang, Hao Li, Rong Jin

In this work, we propose a graph-based re-ranking method to improve learned features while still keeping Euclidean distance as the similarity metric.

Person Re-Identification Re-Ranking +1

Paper
Code

Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

no code implementations • CVPR 2021 • Huiwen Luo, Koki Nagano, Han-Wei Kung, Mclean Goldwhite, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, Hao Li

Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments.

3D Face Reconstruction Face Model

Paper
Add Code

SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature

1 code implementation • CVPR 2021 • Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang

For the fully connected layers, by utilizing the low-rank property of Kronecker factors of Fisher information matrix, our method only requires inverting a small matrix to approximate the curvature with desirable accuracy.

Dimensionality Reduction

Paper
Code

Task-Generic Hierarchical Human Motion Prior using VAEs

no code implementations • 7 Jun 2021 • Jiaman Li, Ruben Villegas, Duygu Ceylan, Jimei Yang, Zhengfei Kuang, Hao Li, Yajie Zhao

We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation, motion completion from partial observations, and motion synthesis from sparse key-frames.

Ranked #4 on Motion Synthesis on LaFAN1

Motion Synthesis Pose Estimation

Paper
Add Code

SKFAC:Training Neural Networks with Faster Kronecker-Factored Approximate Curvature

1 code implementation • Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 • Zedong Tang, Fenlong Jiang, Maoguo Gong, Hao Li, Yue Wu, Fan Yu, Zidong Wang, Min Wang

Dimensionality Reduction

Paper
Code

KVT: k-NN Attention for Boosting Vision Transformers

1 code implementation • 28 May 2021 • Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shuning Chang, Hao Li, Rong Jin

A key component in vision transformers is the fully-connected self-attention which is more powerful than CNNs in modelling long range dependencies.

Paper
Code

Unsupervised Visual Representation Learning by Online Constrained K-Means

1 code implementation • CVPR 2022 • Qi Qian, Yuanhong Xu, Juhua Hu, Hao Li, Rong Jin

Clustering is to assign each instance a pseudo label that will be used to learn representations in discrimination.

Ranked #5 on Unsupervised Image Classification on CIFAR-10

Clustering Contrastive Learning +6

Paper
Code

An Efficient Training Approach for Very Large Scale Face Recognition

1 code implementation • CVPR 2022 • Kai Wang, Shuo Wang, Panpan Zhang, Zhipeng Zhou, Zheng Zhu, Xiaobo Wang, Xiaojiang Peng, Baigui Sun, Hao Li, Yang You

This method adopts Dynamic Class Pool (DCP) for storing and updating the identities features dynamically, which could be regarded as a substitute for the FC layer.

Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification

Paper
Code

An Empirical Study of Vehicle Re-Identification on the AI City Challenge

1 code implementation • 20 May 2021 • Hao Luo, Weihua Chen, Xianzhe Xu, Jianyang Gu, Yuqi Zhang, Chong Liu, Yiqi Jiang, Shuting He, Fan Wang, Hao Li

We mainly focus on four points, i. e. training data, unsupervised domain-adaptive (UDA) training, post-processing, model ensembling in this challenge.

Re-Ranking Retrieval +1

115

Paper
Code

Importance Weighted Adversarial Discriminative Transfer for Anomaly Detection

1 code implementation • 14 May 2021 • Cangning Fan, Fangyi Zhang, Peng Liu, Xiuyu Sun, Hao Li, Ting Xiao, Wei Zhao, Xianglong Tang

In this way, an obvious gap can be produced between the distributions of normal and abnormal data in the target domain, therefore enabling the anomaly detection in the domain.

Anomaly Detection valid

Paper
Code

City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones

1 code implementation • 14 May 2021 • Chong Liu, Yuqi Zhang, Hao Luo, Jiasheng Tang, Weihua Chen, Xianzhe Xu, Fan Wang, Hao Li, Yi-Dong Shen

Multi-Target Multi-Camera Tracking has a wide range of applications and is the basis for many advanced inferences and predictions.

Clustering Vehicle Re-Identification

120

Paper
Code

Maximizing Mutual Information Across Feature and Topology Views for Learning Graph Representations

1 code implementation • 14 May 2021 • Xiaolong Fan, Maoguo Gong, Yue Wu, Hao Li

Specifically, we first utilize a multi-view representation learning module to better capture both local and global information content across feature and topology views on graphs.

Graph Representation Learning

Paper
Code

Why Does Multi-Epoch Training Help?

no code implementations • 13 May 2021 • Yi Xu, Qi Qian, Hao Li, Rong Jin

Stochastic gradient descent (SGD) has become the most attractive optimization method in training large-scale deep neural networks due to its simplicity, low computational cost in each updating step, and good performance.

Paper
Add Code

Spatially Self-Paced Convolutional Networks for Change Detection in Heterogeneous Images

no code implementations • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2021 • Hao Li, Maoguo Gong, Mingyang Zhang, Yue Wu

Change detection in heterogeneous remote sensing images is a challenging problem because it is hard to make a direct comparison in the original observation spaces, and most methods rely on a set of manually labeled samples.

Change Detection

Paper
Add Code

Learning to Cluster Faces via Transformer

no code implementations • 23 Apr 2021 • Jinxing Ye, Xioajiang Peng, Baigui Sun, Kai Wang, Xiuyu Sun, Hao Li, Hanqing Wu

In this paper, we repurpose the well-known Transformer and introduce a Face Transformer for supervised face clustering.

Clustering Face Clustering +2

Paper
Add Code

A Simple Baseline for Semi-supervised Semantic Segmentation with Strong Data Augmentation

1 code implementation • ICCV 2021 • Jianlong Yuan, Yifan Liu, Chunhua Shen, Zhibin Wang, Hao Li

Previous works [3, 27] fail to employ strong augmentation in pseudo label learning efficiently, as the large distribution change caused by strong augmentation harms the batch normalisation statistics.

Ranked #12 on Semi-Supervised Semantic Segmentation on Cityscapes 25% labeled

Data Augmentation Image Classification +3

Paper
Code

Spatiotemporal Entropy Model is All You Need for Learned Video Compression

1 code implementation • 13 Apr 2021 • Zhenhong Sun, Zhiyu Tan, Xiuyu Sun, Fangyi Zhang, Dongyang Li, Yichen Qian, Hao Li

The framework of dominant learned video compression methods is usually composed of motion prediction modules as well as motion vector and residual image compression modules, suffering from its complex structure and error propagation problem.

Image Compression motion prediction +3

Paper
Code

A Theoretical Analysis of Learning with Noisily Labeled Data

no code implementations • 8 Apr 2021 • Yi Xu, Qi Qian, Hao Li, Rong Jin

Noisy labels are very common in deep supervised learning.

Paper
Add Code

Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation

no code implementations • 30 Mar 2021 • Shuning Chang, Pichao Wang, Fan Wang, Hao Li, Jiashi Feng

Temporal action proposal generation (TAPG) is a fundamental and challenging task in video understanding, especially in temporal action detection.

Action Detection Temporal Action Proposal Generation +1

Paper
Add Code

Guided Training: A Simple Method for Single-channel Speaker Separation

no code implementations • 26 Mar 2021 • Hao Li, Xueliang Zhang, Guanglai Gao

Another way is to use an anchor speech, a short speech of the target speaker, to model the speaker identity.

Speaker Separation Speech Separation

Paper
Add Code

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

no code implementations • CVPR 2022 • Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, Xizhou Zhu

However, the automatic design of loss functions for generic tasks with various evaluation metrics remains under-investigated.

Paper
Add Code

Equivariant Point Network for 3D Point Cloud Analysis

1 code implementation • CVPR 2021 • Haiwei Chen, Shichen Liu, Weikai Chen, Hao Li

Features that are equivariant to a larger group of symmetries have been shown to be more discriminative and powerful in recent studies.

100

Paper
Code

PlenOctrees for Real-time Rendering of Neural Radiance Fields

5 code implementations • ICCV 2021 • Alex Yu, RuiLong Li, Matthew Tancik, Hao Li, Ren Ng, Angjoo Kanazawa

We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects.

Neural Rendering Novel View Synthesis

601

Paper
Code

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

1 code implementation • CVPR 2021 • Qiang Zhou, Chaohui Yu, Zhibin Wang, Qi Qian, Hao Li

To alleviate the confirmation bias problem and improve the quality of pseudo annotations, we further propose a co-rectify scheme based on Instant-Teaching, denoted as Instant-Teaching$^*$.

Ranked #12 on Semi-Supervised Object Detection on COCO 100% labeled data (using extra training data)

Object object-detection +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.