Search Results for author: Xin Li

Found 359 papers, 153 papers with code

Sparse-to-Dense Depth Completion Revisited: Sampling Strategy and Graph Construction

no code implementations ECCV 2020 Xin Xiong, Haipeng Xiong, Ke Xian, Chen Zhao, Zhiguo Cao, Xin Li

Depth completion is a widely studied problem of predicting a dense depth map from a sparse set of measurements and a single RGB image.

Depth Completion graph construction

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition

1 code implementation ECCV 2020 Matthew Korban, Xin Li

We propose a Dynamic Directed Graph Convolutional Network (DDGCN) to model spatial and temporal features of human actions from their skeletal representations.

Action Recognition

Aspect-based Sentiment Analysis in Question Answering Forums

1 code implementation Findings (EMNLP) 2021 Wenxuan Zhang, Yang Deng, Xin Li, Lidong Bing, Wai Lam

This motivates us to investigate the task of ABSA on QA forums (ABSA-QA), aiming to jointly detect the discussed aspects and their sentiment polarities for a given QA pair.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation28 Mar 2024 Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Skull-to-Face: Anatomy-Guided 3D Facial Reconstruction and Editing

no code implementations24 Mar 2024 Yongqing Liang, Congyi Zhang, Junli Zhao, Wenping Wang, Xin Li

Existing methods for automated facial reconstruction yield inaccurate results, suffering from the non-determinative nature of the problem that a skull with a sparse set of tissue depth cannot fully determine the skinned face.

3D Face Reconstruction Anatomy

TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling

no code implementations18 Mar 2024 Weiran Chen, Xin Li, Jiaqi Su, Guiqian Zhu, Ying Li, Yi Ji, Chunping Liu

As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically.

Image Captioning Visual Storytelling

COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization

no code implementations11 Mar 2024 Aozhong zhang, Zi Yang, Naigang Wang, Yingyong Qin, Jack Xin, Xin Li, Penghang Yin

Within a fixed layer, COMQ treats all the scaling factor(s) and bit-codes as the variables of the reconstruction error.

Quantization

Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?

no code implementations10 Mar 2024 Hanxin Zhu, Tianyu He, Xin Li, Bingchen Li, Zhibo Chen

Neural Radiance Field (NeRF) has achieved superior performance for novel view synthesis by modeling the scene with a Multi-Layer Perception (MLP) and a volume rendering procedure, however, when fewer known views are given (i. e., few-shot view synthesis), the model is prone to overfit the given views.

Novel View Synthesis

A streamlined Approach to Multimodal Few-Shot Class Incremental Learning for Fine-Grained Datasets

2 code implementations10 Mar 2024 Thang Doan, Sima Behpour, Xin Li, Wenbin He, Liang Gou, Liu Ren

Few-shot Class-Incremental Learning (FSCIL) poses the challenge of retaining prior knowledge while learning from limited new data streams, all without overfitting.

Few-Shot Class-Incremental Learning Incremental Learning

SeD: Semantic-Aware Discriminator for Image Super-Resolution

1 code implementation29 Feb 2024 Bingchen Li, Xin Li, Hanxin Zhu, Yeying Jin, Ruoyu Feng, Zhizheng Zhang, Zhibo Chen

In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner.

Image Super-Resolution

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

no code implementations29 Feb 2024 Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

It can represent that the contrastive learning between the visual holistic representations and the multimodal fine-grained features of document objects can assist the vision encoder in acquiring more effective visual cues, thereby enhancing the comprehension of text-rich documents in LVLMs.

Contrastive Learning document understanding

Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection

no code implementations28 Feb 2024 Xun Huang, Hai Wu, Xin Li, Xiaoliang Fan, Chenglu Wen, Cheng Wang

LiDAR-based 3D object detection models have traditionally struggled under rainy conditions due to the degraded and noisy scanning signals.

Knowledge Distillation object-detection +1

Neural Radiance Fields in Medical Imaging: Challenges and Next Steps

no code implementations26 Feb 2024 Xin Wang, Shu Hu, Heng Fan, Hongtu Zhu, Xin Li

Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data.

On Organizational Principles of Neural Systems

no code implementations22 Feb 2024 Xin Li

Inspired by classical embodied cognition and the emerging multimodal interaction, we study the organizational principles of neural systems at three levels (device/implementation, circuit/algorithm, and system/computational) in this survey paper.

scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation

no code implementations18 Feb 2024 Cong Li, Meng Xiao, Pengfei Wang, Guihai Feng, Xin Li, Yuanchun Zhou

Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model.

Language Modelling Large Language Model

KVQ: Kwai Video Quality Assessment for Short-form Videos

no code implementations11 Feb 2024 Yiting Lu, Xin Li, Yajing Pei, Kun Yuan, Qizhi Xie, Yunpeng Qu, Ming Sun, Chao Zhou, Zhibo Chen

Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc.

Video Quality Assessment Visual Question Answering (VQA)

Evading Deep Learning-Based Malware Detectors via Obfuscation: A Deep Reinforcement Learning Approach

no code implementations4 Feb 2024 Brian Etter, James Lee Hu, Mohammedreza Ebrahimi, Weifeng Li, Xin Li, Hsinchun Chen

Adversarial Malware Generation (AMG), the gen- eration of adversarial malware variants to strengthen Deep Learning (DL)-based malware detectors has emerged as a crucial tool in the development of proactive cyberdefense.

Malware Detection reinforcement-learning +1

Spectrum-guided Feature Enhancement Network for Event Person Re-Identification

no code implementations2 Feb 2024 Hongchen Tan, Yi Zhang, Xiuping Liu, BaoCai Yin, Nan Ma, Xin Li, Huchuan Lu

This network consists of two innovative components: the Multi-grain Spectrum Attention Mechanism (MSAM) and the Consecutive Patch Dropout Module (CPDM).

Person Re-Identification

Detecting Multimedia Generated by Large AI Models: A Survey

1 code implementation22 Jan 2024 Li Lin, Neeraj Gupta, Yue Zhang, Hainan Ren, Chun-Hao Liu, Feng Ding, Xin Wang, Xin Li, Luisa Verdoliva, Shu Hu

The rapid advancement of Large AI Models (LAIMs), particularly diffusion models and large language models, has marked a new era where AI-generated multimedia is increasingly integrated into various aspects of daily life.

Disentangled Clothed Avatar Generation from Text Descriptions

no code implementations8 Dec 2023 Jionghao Wang, YuAn Liu, Zhiyang Dou, Zhengming Yu, Yongqing Liang, Xin Li, Wenping Wang, Rong Xie, Li Song

In this paper, we introduced a novel text-to-avatar generation method that separately generates the human body and the clothes and allows high-quality animation on the generated avatar.

Virtual Try-on

Cross-BERT for Point Cloud Pretraining

no code implementations8 Dec 2023 Xin Li, Peng Li, Zeyong Wei, Zhe Zhu, Mingqiang Wei, Junhui Hou, Liangliang Nan, Jing Qin, Haoran Xie, Fu Lee Wang

By performing cross-modal interaction, Cross-BERT can smoothly reconstruct the masked tokens during pretraining, leading to notable performance enhancements for downstream tasks.

Self-Supervised Learning

SeaLLMs -- Large Language Models for Southeast Asia

1 code implementation1 Dec 2023 Xuan-Phi Nguyen, Wenxuan Zhang, Xin Li, Mahani Aljunied, Qingyu Tan, Liying Cheng, Guanzheng Chen, Yue Deng, Sen yang, Chaoqun Liu, Hang Zhang, Lidong Bing

Despite the remarkable achievements of large language models (LLMs) in various tasks, there remains a linguistic bias that favors high-resource languages, such as English, often at the expense of low-resource and regional languages.

Instruction Following

Brainformer: Modeling MRI Brain Functions to Machine Vision

no code implementations30 Nov 2023 Xuan-Bac Nguyen, Xin Li, Samee U. Khan, Khoa Luu

In this work, we first present a simple yet effective Brainformer approach, a novel Transformer-based framework, to analyze the patterns of fMRI in the human perception system from the machine learning perspective.

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

2 code implementations28 Nov 2023 Sicong Leng, Hang Zhang, Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing

Large Vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned.

Hallucination Object

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

1 code implementation28 Nov 2023 Zhihe Lu, Jiawang Bai, Xin Li, Zeyu Xiao, Xinchao Wang

However, performance advancements are limited when relying solely on intricate algorithmic designs for a single model, even one exhibiting strong performance, e. g., CLIP-ViT-B/16.

Prompt Engineering

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

no code implementations26 Nov 2023 Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan Bac Nguyen, Ashley Dowling, Xin Li, Khoa Luu

In precision agriculture, the detection and recognition of insects play an essential role in the ability of crops to grow healthy and produce a high-quality yield.

Self-Supervised Learning

Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs

1 code implementation16 Nov 2023 Sen yang, Xin Li, Leyang Cui, Lidong Bing, Wai Lam

Though prompting LLMs with various reasoning structures produces reasoning proofs along with answers, these proofs are not ensured to be causal and reliable due to the inherent defects of LLMs.

GSM8K

On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

1 code implementation9 Nov 2023 Licheng Wen, Xuemeng Yang, Daocheng Fu, XiaoFeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao

This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving.

Autonomous Driving Common Sense Reasoning +4

CLEX: Continuous Length Extrapolation for Large Language Models

1 code implementation25 Oct 2023 Guanzheng Chen, Xin Li, Zaiqiao Meng, Shangsong Liang, Lidong Bing

We generalise the PE scaling approaches to model the continuous dynamics by ordinary differential equations over the length scaling factor, thereby overcoming the constraints of current PE scaling methods designed for specific lengths.

Position

Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations

no code implementations24 Oct 2023 Ye Yuan, Xin Li, Yong Heng, Leiji Zhang, Mingzhong Wang

Imitation Learning (IL) aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations.

Imitation Learning

Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time Pretraining for Complex Temporal Reasoning

1 code implementation23 Oct 2023 Sen yang, Xin Li, Lidong Bing, Wai Lam

However, the knowledge-time association is usually insufficient for the downstream tasks that require reasoning over temporal dependencies between knowledge.

Question Answering

Diagnosis-oriented Medical Image Compression with Efficient Transfer Learning

no code implementations20 Oct 2023 Guangqi Xie, Xin Li, Xiaohan Pan, Zhibo Chen

Remote medical diagnosis has emerged as a critical and indispensable technique in practical medical systems, where medical data are required to be efficiently compressed and transmitted for diagnosis by either professional doctors or intelligent diagnosis devices.

Coronary Artery Segmentation Image Compression +2

Demystifying the Myths and Legends of Nonconvex Convergence of SGD

no code implementations19 Oct 2023 Aritra Dutta, El Houcine Bergou, Soumia Boucherouite, Nicklas Werge, Melih Kandemir, Xin Li

Additionally, our analyses allow us to measure the density of the $\epsilon$-stationary points in the final iterates of SGD, and we recover the classical $O(\frac{1}{\sqrt{T}})$ asymptotic rate under various existing assumptions on the objective function and the bounds on the stochastic gradient.

FreqAlign: Excavating Perception-oriented Transferability for Blind Image Quality Assessment from A Frequency Perspective

no code implementations29 Sep 2023 Xin Li, Yiting Lu, Zhibo Chen

Based on this, we propose to improve the perception-oriented transferability of BIQA by performing feature frequency decomposition and selecting the frequency components that contained the most transferable perception knowledge for alignment.

Blind Image Quality Assessment Unsupervised Domain Adaptation

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

2 code implementations28 Sep 2023 Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao

Recent advancements in autonomous driving have relied on data-driven approaches, which are widely adopted but face challenges including dataset bias, overfitting, and uninterpretability.

Autonomous Driving Common Sense Reasoning +1

GAFlow: Incorporating Gaussian Attention into Optical Flow

1 code implementation ICCV 2023 Ao Luo, Fan Yang, Xin Li, Lang Nie, Chunyu Lin, Haoqiang Fan, Shuaicheng Liu

Moreover, for reliable motion analysis, we provide a new Gaussian-Guided Attention Module (GGAM) which not only inherits properties from Gaussian distribution to instinctively revolve around the neighbor fields of each point but also is empowered to put the emphasis on contextually related regions during matching.

Optical Flow Estimation Representation Learning

3D Multiple Object Tracking on Autonomous Driving: A Literature Review

no code implementations27 Sep 2023 Peng Zhang, Xin Li, Liang He, Xin Lin

This paper undertakes a comprehensive examination, assessment, and synthesis of the research landscape in this domain, remaining attuned to the latest developments in 3D MOT while suggesting prospective avenues for future investigation.

3D Multi-Object Tracking Autonomous Driving +1

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

1 code implementation NeurIPS 2023 Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, Xinchao Wang

To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i. e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph.

Transfer Learning

Image-to-Image Translation with Deep Reinforcement Learning

1 code implementation24 Sep 2023 Xin Wang, Ziwei Luo, Jing Hu, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Xin Li, Siwei Lyu

The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform a source image successively to a target image.

Auxiliary Learning Decision Making +3

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

no code implementations1 Sep 2023 Xin Li, Wenqing Chu, Ye Wu, Weihang Yuan, Fanglong Liu, Qi Zhang, Fu Li, Haocheng Feng, Errui Ding, Jingdong Wang

In this paper, we present VideoGen, a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion.

Text-to-Image Generation Text-to-Video Generation +1

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

no code implementations31 Aug 2023 El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$.

Bi-Modality Medical Image Synthesis Using Semi-Supervised Sequential Generative Adversarial Networks

no code implementations27 Aug 2023 Xin Yang, Yi Lin, Zhiwei Wang, Xin Li, Kwang-Ting Cheng

A method for measuring the synthesis complexity is proposed to automatically determine the synthesis order in our sequential GAN.

Generative Adversarial Network Image Generation

MixNet: Toward Accurate Detection of Challenging Scene Text in the Wild

1 code implementation23 Aug 2023 Yu-Xiang Zeng, Jun-Wei Hsieh, Xin Li, Ming-Ching Chang

Detecting small scene text instances in the wild is particularly challenging, where the influence of irregular positions and nonideal lighting often leads to detection errors.

Scene Text Detection Text Detection

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation ICCV 2023 Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey

1 code implementation18 Aug 2023 Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wenjun Zeng, Xinchao Wang, Zhibo Chen

Image restoration (IR) has been an indispensable and challenging task in the low-level vision field, which strives to improve the subjective quality of images distorted by various forms of degradation.

Deblurring Image Restoration +2

The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution

1 code implementation1 Aug 2023 Xuan-Bac Nguyen, Xudong Liu, Xin Li, Khoa Luu

The goal is to predict brain responses across the entire visual brain, as it is the region where the most reliable responses to images have been observed.

Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning

no code implementations20 Jul 2023 Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma

Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects.

Management

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

1 code implementation14 Jul 2023 Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao

In this paper, we explore the potential of using a large language model (LLM) to understand the driving environment in a human-like manner and analyze its ability to reason, interpret, and memorize when facing complex scenarios.

Autonomous Driving Common Sense Reasoning +3

Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media

no code implementations7 Jul 2023 Chuanbo Hu, Bin Liu, Xin Li, Yanfang Ye

By integrating prior knowledge and the proposed prompts, ChatGPT can effectively identify and label drug trafficking activities on social networks, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection.

Marketing

TCEIP: Text Condition Embedded Regression Network for Dental Implant Position Prediction

no code implementations26 Jun 2023 Xinquan Yang, Jinheng Xie, Xuguang Li, Xuechen Li, Xin Li, Linlin Shen, Yongqiang Deng

When deep neural network has been proposed to assist the dentist in designing the location of dental implant, most of them are targeting simple cases where only one missing tooth is available.

Position Position regression +1

Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification

1 code implementation19 Jun 2023 Zhiwei Wang, Junlin Xian, Kangyi Liu, Xin Li, Qiang Li, Xin Yang

Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i. e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information.

Clinical Knowledge

Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework

no code implementations11 Jun 2023 Minglei Yin, Bin Liu, Neil Zhenqiang Gong, Xin Li

Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach.

Contrastive Learning Image Reconstruction +1

Learning Probabilistic Coordinate Fields for Robust Correspondences

no code implementations7 Jun 2023 Weiyue Zhao, Hao Lu, Xinyi Ye, Zhiguo Cao, Xin Li

We introduce Probabilistic Coordinate Fields (PCFs), a novel geometric-invariant coordinate representation for image correspondence problems.

Image Registration Pose Estimation

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

1 code implementation5 Jun 2023 Hang Zhang, Xin Li, Lidong Bing

We present Video-LLaMA a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video.

Language Modelling Text Generation +7

A2B: Anchor to Barycentric Coordinate for Robust Correspondence

no code implementations5 Jun 2023 Weiyue Zhao, Hao Lu, Zhiguo Cao, Xin Li

This approach offers a new perspective to alleviate the problem of repeated patterns and emphasizes the importance of choosing coordinate representations for feature correspondences.

nnMobileNe: Rethinking CNN for Retinopathy Research

2 code implementations2 Jun 2023 Wenhui Zhu, Peijie Qiu, Xin Li, Natasha Lepore, Oana M. Dumitrascu, Yalin Wang

Over the past few decades, convolutional neural networks (CNNs) have been at the forefront of the detection and tracking of various retinal diseases (RD).

Diabetic Retinopathy Grading

AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach

1 code implementation31 May 2023 Jia Guo, Liying Cheng, Wenxuan Zhang, Stanley Kok, Xin Li, Lidong Bing

In this work, we for the first time propose a challenging argument quadruplet extraction task (AQE), which can provide an all-in-one extraction of four argumentative components, i. e., claims, evidence, evidence types, and stances.

Argument Mining Stance Classification +1

Self-aware and Cross-sample Prototypical Learning for Semi-supervised Medical Image Segmentation

no code implementations25 May 2023 Zhenxi Zhang, Ran Ran, Chunna Tian, Heng Zhou, Xin Li, Fan Yang, Zhicheng Jiao

To address these issues, we propose a self-aware and cross-sample prototypical learning method (SCP-Net) to enhance the diversity of prediction in consistency learning by utilizing a broader range of semantic information derived from multiple inputs.

Image Segmentation Semantic Segmentation +1

Cross-supervised Dual Classifiers for Semi-supervised Medical Image Segmentation

no code implementations25 May 2023 Zhenxi Zhang, Ran Ran, Chunna Tian, Heng Zhou, Fan Yang, Xin Li, Zhicheng Jiao

This paper proposes a cross-supervised learning framework based on dual classifiers (DC-Net), including an evidential classifier and a vanilla classifier.

Image Segmentation Segmentation +2

mPMR: A Multilingual Pre-trained Machine Reader at Scale

1 code implementation23 May 2023 Weiwen Xu, Xin Li, Wai Lam, Lidong Bing

mPMR aims to guide multilingual pre-trained language models (mPLMs) to perform natural language understanding (NLU) including both sequence classification and span extraction in multiple languages.

Classification Machine Reading Comprehension +3

Improving Self-training for Cross-lingual Named Entity Recognition with Contrastive and Prototype Learning

1 code implementation23 May 2023 Ran Zhou, Xin Li, Lidong Bing, Erik Cambria, Chunyan Miao

In cross-lingual named entity recognition (NER), self-training is commonly used to bridge the linguistic gap by training on pseudo-labeled target-language data.

Cross-Lingual NER named-entity-recognition +4

Two-Stream Regression Network for Dental Implant Position Prediction

no code implementations17 May 2023 Xinquan Yang, Xuguang Li, Xuechen Li, WenTing Chen, Linlin Shen, Xin Li, Yongqiang Deng

In this paper, we develop a two-stream implant position regression framework (TSIPR), which consists of an implant region detector (IRD) and a multi-scale patch embedding regression network (MSPENet), to address this issue.

Position Position regression +1

GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark

no code implementations11 May 2023 Dongyang Li, Ruixue Ding, Qiang Zhang, Zheng Li, Boli Chen, Pengjun Xie, Yao Xu, Xin Li, Ning Guo, Fei Huang, Xiaofeng He

With a fast developing pace of geographic applications, automatable and intelligent models are essential to be designed to handle the large volume of information.

Entity Alignment Natural Language Understanding

UPDExplainer: an Interpretable Transformer-based Framework for Urban Physical Disorder Detection Using Street View Imagery

no code implementations4 May 2023 Chuanbo Hu, Shan Jia, Fan Zhang, Changjiang Xiao, Mindi Ruan, Jacob Thrasher, Xin Li

Experimental results on the re-annotated Place Pulse 2. 0 dataset demonstrate promising detection performance of the proposed method, with an accuracy of 79. 9%.

Semantic Segmentation

MEDIC: A Multimodal Empathy Dataset in Counseling

no code implementations4 May 2023 Zhou'an_Zhu, Xin Li, Jicai Pan, Yufei Xiao, Yanan Chang, Feiyi Zheng, Shangfei Wang

We also propose three labels (i. e., expression of experience, emotional reaction, and cognitive reaction) to describe the degree of empathy between counselors and their clients.

SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition

no code implementations27 Apr 2023 Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu

This paper introduces a novel approach to Social Group Activity Recognition (SoGAR) using Self-supervised Transformers network that can effectively utilize unlabeled video data.

Group Activity Recognition

Micron-BERT: BERT-based Facial Micro-Expression Recognition

1 code implementation CVPR 2023 Xuan-Bac Nguyen, Chi Nhan Duong, Xin Li, Susan Gauch, Han-Seok Seo, Khoa Luu

By incorporating these components into an end-to-end deep network, the proposed $\mu$-BERT significantly outperforms all previous work in various micro-expression tasks.

Micro Expression Recognition Micro-Expression Recognition +1

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

no code implementations4 Apr 2023 Yongxin Zhu, Zhen Liu, Yukang Liang, Xin Li, Hao liu, Changcun Bao, Linli Xu

Different to conventional STVQA models which take the linguistic semantics and visual semantics in scene text as two separate features, in this paper, we propose a paradigm of "Locate Then Generate" (LTG), which explicitly unifies this two semantics with the spatial bounding box as a bridge connecting them.

Answer Generation Language Modelling +3

MobileInst: Video Instance Segmentation on the Mobile

no code implementations30 Mar 2023 Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang

To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices.

Instance Segmentation Segmentation +2

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

no code implementations16 Mar 2023 Hao liu, Xin Li, Mingming Gong, Bing Liu, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Xing Sun

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community.

Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

2 code implementations CVPR 2023 Xin Li, Bingchen Li, Xin Jin, Cuiling Lan, Zhibo Chen

In this paper, we are the first to propose a novel training strategy for image restoration from the causality perspective, to improve the generalization ability of DNNs for unknown degradations.

counterfactual Image Restoration +2

SCPNet: Semantic Scene Completion on Point Cloud

1 code implementation CVPR 2023 Zhaoyang Xia, Youquan Liu, Xin Li, Xinge Zhu, Yuexin Ma, Yikang Li, Yuenan Hou, Yu Qiao

We propose a simple yet effective label rectification strategy, which uses off-the-shelf panoptic segmentation labels to remove the traces of dynamic objects in completion labels, greatly improving the performance of deep models especially for those moving objects.

3D Semantic Scene Completion Knowledge Distillation +3

Toward a Geometric Theory of Manifold Untangling

no code implementations7 Mar 2023 Xin Li, Shuo Wang

It has been hypothesized that the ventral stream processing for object recognition is based on a mechanism called cortically local subspace untangling.

Object Object Recognition

Toward NeuroDM: Where Computational Neuroscience Meets Data Mining

no code implementations7 Mar 2023 Xin Li, Bin Liu, Shuo Wang

At the intersection of computational neuroscience (CN) and data mining (DM), we advocate a holistic view toward their rich connections.

SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition

1 code implementation6 Mar 2023 Naga VS Raviteja Chappa, Pha Nguyen, Alexander H Nelson, Han-Seok Seo, Xin Li, Page Daniel Dobbs, Khoa Luu

In this paper, we propose a new, simple, and effective Self-supervised Spatio-temporal Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled video data.

Group Activity Recognition

MorphGANFormer: Transformer-based Face Morphing and De-Morphing

no code implementations18 Feb 2023 Na Zhang, Xudong Liu, Xin Li, Guo-Jun Qi

Semantic face image manipulation has received increasing attention in recent years.

Image Manipulation

Analysis of Biomass Sustainability Indicators from a Machine Learning Perspective

no code implementations2 Feb 2023 Syeda Nyma Ferdous, Xin Li, Kamalakanta Sahoo, Richard Bergman

This study proposes a robust model for biomass sustainability prediction by analyzing sustainability indicators using machine learning models.

Ensemble Learning Management +1

PointSmile: Point Self-supervised Learning via Curriculum Mutual Information

no code implementations30 Jan 2023 Xin Li, Mingqiang Wei, Songcan Chen

From the perspective of how-and-what-to-learn, PointSmile is designed to imitate human curriculum learning, i. e., starting with an easy curriculum and gradually increasing the difficulty of that curriculum.

Data Augmentation Self-Supervised Learning

Negative Flux Aggregation to Estimate Feature Attributions

1 code implementation17 Jan 2023 Xin Li, Deng Pan, Chengyin Li, Yao Qiang, Dongxiao Zhu

There are increasing demands for understanding deep neural networks' (DNNs) behavior spurred by growing security and/or transparency concerns.

Multi-Constraint Molecular Generation using Sparsely Labelled Training Data for Localized High-Concentration Electrolyte Diluent Screening

no code implementations12 Jan 2023 Jonathan P. Mailoa, Xin Li, Jiezhong Qiu, Shengyu Zhang

Recently, machine learning methods have been used to propose molecules with desired properties, which is especially useful for exploring large chemical spaces efficiently.

MGeo: Multi-Modal Geographic Pre-Training Method

1 code implementation11 Jan 2023 Ruixue Ding, Boli Chen, Pengjun Xie, Fei Huang, Xin Li, Qiang Zhang, Yao Xu

Single-modal PTMs can barely make use of the important GC and therefore have limited performance.

Language Modelling

CoIn: Contrastive Instance Feature Mining for Outdoor 3D Object Detection with Very Limited Annotations

1 code implementation ICCV 2023 Qiming Xia, Jinhao Deng, Chenglu Wen, Hai Wu, Shaoshuai Shi, Xin Li, Cheng Wang

Combining CoIn with an iterative training strategy, we propose a CoIn++ pipeline, which requires only 2% annotations in the KITTI dataset to achieve performance comparable to the fully supervised methods.

3D Object Detection Contrastive Learning +2

Vector Quantization With Self-Attention for Quality-Independent Representation Learning

no code implementations CVPR 2023 Zhou Yang, Weisheng Dong, Xin Li, Mengluan Huang, Yulin Sun, Guangming Shi

During training, we enforce the quantization of features from clean and corrupted images in the same discrete embedding space so that an invariant quality-independent feature representation can be learned to improve the recognition robustness of low-quality images.

Data Augmentation Image Restoration +2

Joint Beamforming Design for Dual-Functional MIMO Radar and Communication Systems Guaranteeing Physical Layer Security

no code implementations1 Jan 2023 Fuwang Dong, Wei Wang, Xin Li, Fan Liu, Sheng Chen, Lajos Hanzo

The dual-functional radar and communication (DFRC) technique constitutes a promising next-generation wireless solution, due to its benefits in terms of power consumption, physical hardware, and spectrum exploitation.

Low-Light Image Enhancement with Multi-Stage Residue Quantization and Brightness-Aware Attention

1 code implementation ICCV 2023 Yunlong Liu, Tao Huang, Weisheng Dong, Fangfang Wu, Xin Li, Guangming Shi

Deep learning-based LLIE methods focus on learning a mapping function between low-light images and normal-light images that outperforms conventional LLIE methods.

Low-Light Image Enhancement Quantization

Self-Supervised Non-Uniform Kernel Estimation With Flow-Based Motion Prior for Blind Image Deblurring

no code implementations CVPR 2023 Zhenxuan Fang, Fangfang Wu, Weisheng Dong, Xin Li, Jinjian Wu, Guangming Shi

To address these issues, we propose to represent the field of motion blur kernels in a latent space by normalizing flows, and design CNNs to predict the latent codes instead of motion kernels.

Blind Image Deblurring Image Deblurring

WL-Align: Weisfeiler-Lehman Relabeling for Aligning Users across Networks via Regularized Representation Learning

1 code implementation29 Dec 2022 Li Liu, Penggang Chen, Xin Li, William K. Cheung, Youmin Zhang, Qun Liu, Guoyin Wang

Aligning users across networks using graph representation learning has been found effective where the alignment is accomplished in a low-dimensional embedding space.

Graph Representation Learning

Coarse-to-Fine Contrastive Learning on Graphs

no code implementations13 Dec 2022 Peiyao Zhao, Yuangang Pan, Xin Li, Xu Chen, Ivor W. Tsang, Lejian Liao

Inspired by the impressive success of contrastive learning (CL), a variety of graph augmentation strategies have been employed to learn node representations in a self-supervised manner.

Contrastive Learning Learning-To-Rank

From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Model to Pre-trained Machine Reader

1 code implementation9 Dec 2022 Weiwen Xu, Xin Li, Wenxuan Zhang, Meng Zhou, Wai Lam, Luo Si, Lidong Bing

We present Pre-trained Machine Reader (PMR), a novel method for retrofitting pre-trained masked language models (MLMs) to pre-trained machine reading comprehension (MRC) models without acquiring labeled data.

Classification Extractive Question-Answering +6

Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

1 code implementation6 Dec 2022 Xin Li, Cuiling Lan, Guoqiang Wei, Zhibo Chen

In this way, our message broadcasting encourages the group tokens to learn more informative and diverse information for effective domain alignment.

Pseudo Label Unsupervised Domain Adaptation

AdaCM: Adaptive ColorMLP for Real-Time Universal Photo-realistic Style Transfer

no code implementations3 Dec 2022 Tianwei Lin, Honglin Lin, Fu Li, Dongliang He, Wenhao Wu, Meiling Wang, Xin Li, Yong liu

Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair.

Style Transfer

Learning Compact Features via In-Training Representation Alignment

no code implementations23 Nov 2022 Xin Li, Xiangrui Li, Deng Pan, Yao Qiang, Dongxiao Zhu

Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of the feature extractor (i. e., last hidden layer) and a linear classifier (i. e., output layer) that are trained jointly with stochastic gradient descent (SGD) on the loss function (e. g., cross-entropy).

Representation Learning

Transformation-Equivariant 3D Object Detection for Autonomous Driving

no code implementations22 Nov 2022 Hai Wu, Chenglu Wen, Wei Li, Xin Li, Ruigang Yang, Cheng Wang

However, it is difficult to apply such networks to 3D object detection in autonomous driving due to its large computation cost and slow reasoning speed.

3D Object Detection Autonomous Driving +3

ConNER: Consistency Training for Cross-lingual Named Entity Recognition

1 code implementation17 Nov 2022 Ran Zhou, Xin Li, Lidong Bing, Erik Cambria, Luo Si, Chunyan Miao

We propose ConNER as a novel consistency training framework for cross-lingual NER, which comprises of: (1) translation-based consistency training on unlabeled target-language data, and (2) dropoutbased consistency training on labeled source-language data.

Cross-Lingual NER Knowledge Distillation +3

SMILEtrack: SiMIlarity LEarning for Occlusion-Aware Multiple Object Tracking

2 code implementations16 Nov 2022 Yu-Hsiang Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming-Ching Chang, Hung Hin So, Xin Li

Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance.

 Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)

Multi-Object Tracking Multiple Object Tracking +3

Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations

1 code implementation16 Nov 2022 Linlin Liu, Xingxuan Li, Megh Thakkar, Xin Li, Shafiq Joty, Luo Si, Lidong Bing

Due to the huge amount of parameters, fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios.

Batch-based Model Registration for Fast 3D Sherd Reconstruction

no code implementations ICCV 2023 Jiepeng Wang, Congyi Zhang, Peng Wang, Xin Li, Peter J. Cobb, Christian Theobalt, Wenping Wang

In this work, we aim to develop a portable, high-throughput, and accurate reconstruction system for efficient digitization of fragments excavated in archaeological sites.

3D Reconstruction

RRSR:Reciprocal Reference-based Image Super-Resolution with Progressive Feature Alignment and Selection

no code implementations8 Nov 2022 Lin Zhang, Xin Li, Dongliang He, Fu Li, Yili Wang, Zhaoxiang Zhang

While previous state-of-the-art RefSR methods mainly focus on improving the efficacy and robustness of reference feature transfer, it is generally overlooked that a well reconstructed SR image should enable better SR reconstruction for its similar LR images when it is referred to as.

feature selection Image Super-Resolution

Behavior Prior Representation learning for Offline Reinforcement Learning

1 code implementation2 Nov 2022 Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche

Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm.

Offline RL reinforcement-learning +2

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

no code implementations1 Nov 2022 Riashat Islam, Hongyu Zang, Anirudh Goyal, Alex Lamb, Kenji Kawaguchi, Xin Li, Romain Laroche, Yoshua Bengio, Remi Tachet des Combes

Goal-conditioned reinforcement learning (RL) is a promising direction for training agents that are capable of solving multiple tasks and reach a diverse set of objectives.

reinforcement-learning Reinforcement Learning (RL)

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

1 code implementation31 Oct 2022 Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.

Offline RL Reinforcement Learning (RL) +1

Fusion-based Few-Shot Morphing Attack Detection and Fingerprinting

1 code implementation27 Oct 2022 Na Zhang, Shan Jia, Siwei Lyu, Xin Li

Our technical contributions include: 1) We propose a fusion-based few-shot learning (FSL) method to learn discriminative features that can generalize to unseen morphing attack types from predefined presentation attacks; 2) The proposed FSL based on the fusion of the PRNU model and Noiseprint network is extended from binary MAD to multiclass morphing attack fingerprinting (MAF).

Face Recognition Few-Shot Learning

Multi-view Representation Learning from Malware to Defend Against Adversarial Variants

no code implementations25 Oct 2022 James Lee Hu, MohammadReza Ebrahimi, Weifeng Li, Xin Li, Hsinchun Chen

This provides an opportunity for the defenders (i. e., malware detectors) to detect the adversarial variants by utilizing more than one view of a malware file (e. g., source code view in addition to the binary view).

Adversarial Robustness MULTI-VIEW LEARNING +1

Development of a Hybrid Simulation and Experiment Test Platform for Dynamic Positioning Vessels

no code implementations23 Oct 2022 Changjun Hu, Quan Shi, Xin Li, Xiaoxian Guo

The test platform can test the performance of DP system and determine the operational time window.

Joint Rigid Motion Correction and Sparse-View CT via Self-Calibrating Neural Field

no code implementations23 Oct 2022 Qing Wu, Xin Li, Hongjiang Wei, Jingyi Yu, Yuyao Zhang

NeRF-based SVCT methods represent the desired CT image as a continuous function of spatial coordinates and train a Multi-Layer Perceptron (MLP) to learn the function by minimizing loss on the SV sinogram.

Deep Learning-Based Channel Estimation for Double-RIS Aided Massive MIMO System

no code implementations22 Oct 2022 Mengbing Liu, Xin Li, Boyu Ning, Chongwen Huang, Sumei Sun, Chau Yuen

Reconfigurable Intelligent Surface (RIS) is considered as an energy-efficient solution for future wireless communication networks due to its fast and low-cost configuration.

Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection

no code implementations18 Oct 2022 Xin Li, Botian Shi, Yuenan Hou, Xingjiao Wu, Tianlong Ma, Yikang Li, Liang He

To address these problems, we construct the homogeneous structure between the point cloud and images to avoid projective information loss by transforming the camera features into the LiDAR 3D space.

3D Object Detection Autonomous Driving +1

Retrofitting Multilingual Sentence Embeddings with Abstract Meaning Representation

1 code implementation18 Oct 2022 Deng Cai, Xin Li, Jackie Chun-Sing Ho, Lidong Bing, Wai Lam

Unlike most prior work that only evaluates the ability to measure semantic similarity, we present a thorough evaluation of existing multilingual sentence embeddings and our improved versions, which include a collection of five transfer tasks in different downstream applications.

Semantic Similarity Semantic Textual Similarity +2

PeerDA: Data Augmentation via Modeling Peer Relation for Span Identification Tasks

1 code implementation17 Oct 2022 Weiwen Xu, Xin Li, Yang Deng, Wai Lam, Lidong Bing

Specifically, a novel Peer Data Augmentation (PeerDA) approach is proposed which employs span pairs with the PR relation as the augmentation data for training.

Data Augmentation Relation

Cutting-Splicing data augmentation: A novel technology for medical image segmentation

no code implementations17 Oct 2022 Lianting Hu, Huiying Liang, Jiajie Tang, Xin Li, Li Huang, Long Lu

Background: Medical images are more difficult to acquire and annotate than natural images, which results in data augmentation technologies often being used in medical image segmentation tasks.

Data Augmentation Image Segmentation +4

Toward an Over-parameterized Direct-Fit Model of Visual Perception

no code implementations7 Oct 2022 Xin Li

In this paper, we revisit the problem of computational modeling of simple and complex cells for an over-parameterized and direct-fit model of visual perception.

How Image Generation Helps Visible-to-Infrared Person Re-Identification?

no code implementations4 Oct 2022 Honghu Pan, Yongyong Chen, Yunqi He, Xin Li, Zhenyu He

To this end, we propose Flow2Flow, a unified framework that could jointly achieve training sample expansion and cross-modality image generation for V2I person ReID.

Image Generation Person Re-Identification

Uncertainty Aware Multitask Pyramid Vision Transformer For UAV-Based Object Re-Identification

no code implementations19 Sep 2022 Syeda Nyma Ferdous, Xin Li, Siwei Lyu

Learning a robust and discriminative feature representation is a crucial challenge for object ReID.

Object

Saliency Guided Adversarial Training for Learning Generalizable Features with Applications to Medical Imaging Classification System

no code implementations9 Sep 2022 Xin Li, Yao Qiang, Chengyin Li, Sijia Liu, Dongxiao Zhu

We hypothesize that adversarial training can eliminate shortcut features whereas saliency guided training can filter out non-relevant features; both are nuisance features accounting for the performance degradation on OOD test sets.

Predicting the clinical citation count of biomedical papers using multilayer perceptron neural network

no code implementations7 Sep 2022 Xin Li, Xuli Tang, Qikai Cheng

We extracted ninety-one paper features from three dimensions as the input of the model, including twenty-one features in the paper dimension, thirty-five in the reference dimension, and thirty-five in the citing paper dimension.

Translation

Learned Lossless JPEG Transcoding via Joint Lossy and Residual Compression

no code implementations24 Aug 2022 Xiaoshuai Fan, Xin Li, Zhibo Chen

Our proposed transcoding architecture shows significant superiority in the compression of JPEG images thanks to the collaboration of learned lossy transform coding and residual entropy coding.

Image Compression

Hierarchical Reinforcement Learning Based Video Semantic Coding for Segmentation

no code implementations24 Aug 2022 Guangqi Xie, Xin Li, Shiqi Lin, Li Zhang, Kai Zhang, Yue Li, Zhibo Chen

In this paper, we take a step forward to video semantic compression and propose the Hierarchical Reinforcement Learning based task-driven Video Semantic Coding, named as HRLVSC.

Hierarchical Reinforcement Learning reinforcement-learning +3

HST: Hierarchical Swin Transformer for Compressed Image Super-resolution

3 code implementations21 Aug 2022 Bingchen Li, Xin Li, Yiting Lu, Sen Liu, Ruoyu Feng, Zhibo Chen

Compressed Image Super-resolution has achieved great attention in recent years, where images are degraded with compression artifacts and low-resolution artifacts.

Compressed Image Super-resolution Image Super-Resolution

StyleAM: Perception-Oriented Unsupervised Domain Adaption for Non-reference Image Quality Assessment

no code implementations29 Jul 2022 Yiting Lu, Xin Li, Jianzhao Liu, Zhibo Chen

Specifically, we find a more compact and reliable space i. e., feature style space for perception-oriented UDA based on an interesting/amazing observation, that the feature style (i. e., the mean and variance) of the deep layer in DNNs is exactly associated with the quality score in NR-IQA.

Image Quality Assessment NR-IQA +1

Point Cloud Attacks in Graph Spectral Domain: When 3D Geometry Meets Graph Signal Processing

no code implementations27 Jul 2022 Daizong Liu, Wei Hu, Xin Li

Instead, we propose point cloud attacks from a new perspective -- the graph spectral domain attack, aiming to perturb graph transform coefficients in the spectral domain that corresponds to varying certain geometric structure.

Stroke-Based Autoencoders: Self-Supervised Learners for Efficient Zero-Shot Chinese Character Recognition

no code implementations17 Jul 2022 Zongze Chen, Wenxia Yang, Xin Li

Following its canonical writing order, we first represent a Chinese character as a series of stroke images with a fixed writing order, and then our SAE model is trained to reconstruct this stroke image sequence.

Word Embeddings Zero-Shot Learning

Source-free Unsupervised Domain Adaptation for Blind Image Quality Assessment

no code implementations17 Jul 2022 Jianzhao Liu, Xin Li, Shukun An, Zhibo Chen

Thanks to the development of unsupervised domain adaptation (UDA), some works attempt to transfer the knowledge from a label-sufficient source domain to a label-free target domain under domain shift with UDA.

Blind Image Quality Assessment Unsupervised Domain Adaptation

Neural Color Operators for Sequential Image Retouching

2 code implementations17 Jul 2022 Yili Wang, Xin Li, Kun Xu, Dongliang He, Qi Zhang, Fu Li, Errui Ding

The neural color operator mimics the behavior of traditional color operators and learns pixelwise color transformation while its strength is controlled by a scalar.

Image Enhancement Image Retouching

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

no code implementations13 Jul 2022 Yiting Lu, Jun Fu, Xin Li, Wei Zhou, Sen Liu, Xinxin Zhang, Congfu Jia, Ying Liu, Zhibo Chen

Therefore, we propose a Progressive Reinforcement learning based Instance Discarding module (termed as PRID) to progressively remove quality-irrelevant/negative instances for CCTA VIQA.

Image Quality Assessment Multiple Instance Learning

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

2 code implementations22 Jun 2022 Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, ZiRui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, Ben Hutchinson, Wei Han, Zarana Parekh, Xin Li, Han Zhang, Jason Baldridge, Yonghui Wu

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge.

Machine Translation Text-to-Image Generation +1

Formation Tracking for a Multi-Auv System Based on an Adaptive Sliding Mode Method in the Water Flow Environment

no code implementations9 Jun 2022 Xin Li, Daqi Zhu, Bing Sun, Qi Chen, Wenyang Gan, Zhigang Li

At last, a robust sliding mode controller with continuous model predictive control strategy for the multi-AUV system is developed to achieve leader-follower formation tracking under the presence of bounded flow disturbances, and simulations are implemented to confirm the effectiveness of the proposed method.

Model Predictive Control

Accurate Scoliosis Vertebral Landmark Localization on X-ray Images via Shape-constrained Multi-stage Cascaded CNNs

no code implementations5 Jun 2022 Zhiwei Wang, Jinxin Lv, Yunqiao Yang, Yuanhuai Liang, Yi Lin, Qiang Li, Xin Li, Xin Yang

Vertebral landmark localization is a crucial step for variant spine-related clinical applications, which requires detecting the corner points of 17 vertebrae.

A Saliency-Guided Street View Image Inpainting Framework for Efficient Last-Meters Wayfinding

1 code implementation14 May 2022 Chuanbo Hu, Shan Jia, Fan Zhang, Xin Li

However, due to the large diversity of geographic context and acquisition conditions, the captured SVI always contains various distracting objects (e. g., pedestrians and vehicles), which will distract human visual attention from efficiently finding the destination in the last few meters.

Image Inpainting object-detection +2

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

2 code implementations11 May 2022 Yawei Li, Kai Zhang, Radu Timofte, Luc van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Yanbo Wang, Xiaozhong Ji, Chuming Lin, Donghao Luo, Ying Tai, Chengjie Wang, Zhizhong Zhang, Yuan Xie, Shen Cheng, Ziwei Luo, Lei Yu, Zhihong Wen, Qi Wu1, Youwei Li, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Yuanfei Huang, Meiguang Jin, Hua Huang, Jing Liu, Xinjian Zhang, Yan Wang, Lingshun Long, Gen Li, Yuanfan Zhang, Zuowei Cao, Lei Sun, Panaetov Alexander, Yucong Wang, Minjie Cai, Li Wang, Lu Tian, Zheyuan Wang, Hongbing Ma, Jie Liu, Chao Chen, Yidong Cai, Jie Tang, Gangshan Wu, Weiran Wang, Shirui Huang, Honglei Lu, Huan Liu, Keyan Wang, Jun Chen, Shi Chen, Yuchun Miao, Zimo Huang, Lefei Zhang, Mustafa Ayazoğlu, Wei Xiong, Chengyi Xiong, Fei Wang, Hao Li, Ruimian Wen, Zhijing Yang, Wenbin Zou, Weixin Zheng, Tian Ye, Yuncheng Zhang, Xiangzhen Kong, Aditya Arora, Syed Waqas Zamir, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Dandan Gaoand Dengwen Zhouand Qian Ning, Jingzhu Tang, Han Huang, YuFei Wang, Zhangheng Peng, Haobo Li, Wenxue Guan, Shenghua Gong, Xin Li, Jun Liu, Wanjun Wang, Dengwen Zhou, Kun Zeng, Hanjiang Lin, Xinyu Chen, Jinsheng Fang

The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29. 00dB on DIV2K validation set.

Image Super-Resolution

SwinIQA: Learned Swin Distance for Compressed Image Quality Assessment

1 code implementation9 May 2022 Jianzhao Liu, Xin Li, Yanding Peng, Tao Yu, Zhibo Chen

In this paper, we design a full-reference image quality assessment metric SwinIQA to measure the perceptual quality of compressed images in a learned Swin distance space.

Compressed Image Quality Assessment Image Compression +1

Relational Representation Learning in Visually-Rich Documents

no code implementations5 May 2022 Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren

To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views.

Contrastive Learning Key Information Extraction +3

Global Mapping of Gene/Protein Interactions in PubMed Abstracts: A Framework and an Experiment with P53 Interactions

no code implementations22 Apr 2022 Xin Li, Hsinchun Chen, Zan Huang, Hua Su, Jesse D. Martinez

In this paper, we propose a comprehensive framework for constructing and analyzing large-scale gene functional networks based on the gene/protein interactions extracted from biomedical literature repositories using text mining tools.

Gene Function Prediction with Gene Interaction Networks: A Context Graph Kernel Approach

no code implementations22 Apr 2022 Xin Li, Hsinchun Chen, Jiexun Li, Zhu Zhang

Predicting gene functions is a challenge for biologists in the post genomic era.

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

no code implementations18 Apr 2022 Hao liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data.

DR-GAN: Distribution Regularization for Text-to-Image Generation

1 code implementation17 Apr 2022 Hongchen Tan, Xiuping Liu, BaoCai Yin, Xin Li

This paper presents a new Text-to-Image generation model, named Distribution Regularization Generative Adversarial Network (DR-GAN), to generate images from text descriptions from improved distribution learning.

Generative Adversarial Network Text-to-Image Generation

Context-aware Visual Tracking with Joint Meta-updating

no code implementations4 Apr 2022 Qiuhong Shen, Xin Li, Fanyang Meng, Yongsheng Liang

These deep trackers usually do not perform online update or update single sub-branch of the tracking model, for which they cannot adapt to the appearance variation of objects.

Meta-Learning Visual Object Tracking +1

Unsupervised Learning of Accurate Siamese Tracking

1 code implementation CVPR 2022 Qiuhong Shen, Lei Qiao, Jinyang Guo, Peixia Li, Xin Li, Bo Li, Weitao Feng, Weihao Gan, Wei Wu, Wanli Ouyang

As unlimited self-supervision signals can be obtained by tracking a video along a cycle in time, we investigate evolving a Siamese tracker by tracking videos forward-backward.

Visual Object Tracking

Aggregate effects of advertising decisions: a complex systems look at search engine advertising via an experimental study

no code implementations4 Mar 2022 Yanwu Yang, Xin Li, Bernard J. Jansen, Daniel Zeng

Originality: This is one of the first research works to explore collective group decisions and resulting phenomena in the complex context of search engine advertising via developing and validating a simulation framework that supports assessments of various advertising strategies and estimations of the impact of mechanisms on the search market.

A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges

1 code implementation2 Mar 2022 Wenxuan Zhang, Xin Li, Yang Deng, Lidong Bing, Wai Lam

More specifically, we provide a new taxonomy for ABSA which organizes existing studies from the axes of concerned sentiment elements, with an emphasis on recent advances of compound ABSA tasks.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence

no code implementations CVPR 2022 Zhihong Pan, Baopu Li, Dongliang He, Mingde Yao, Wenhao Wu, Tianwei Lin, Xin Li, Errui Ding

Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel.

Image Super-Resolution

Model Attribution of Face-swap Deepfake Videos

1 code implementation25 Feb 2022 Shan Jia, Xin Li, Siwei Lyu

Then we take Deepfakes model attribution as a multiclass classification task and propose a spatial and temporal attention based method to explore the differences among Deepfakes in the new dataset.

Attribute Face Swapping

Low-Rank Phase Retrieval with Structured Tensor Models

no code implementations15 Feb 2022 Soo Min Kwon, Xin Li, Anand D. Sarwate

We study the low-rank phase retrieval problem, where the objective is to recover a sequence of signals (typically images) given the magnitude of linear measurements of those signals.

Retrieval

Learning Optical Flow with Adaptive Graph Reasoning

1 code implementation8 Feb 2022 Ao Luo, Fan Yang, Kunming Luo, Xin Li, Haoqiang Fan, Shuaicheng Liu

Our key idea is to decouple the context reasoning from the matching procedure, and exploit scene information to effectively assist motion estimation by learning to reason over the adaptive graph.

Motion Estimation Optical Flow Estimation +1

Multi-modal Sensor Fusion for Auto Driving Perception: A Survey

no code implementations6 Feb 2022 Keli Huang, Botian Shi, Xiang Li, Xin Li, Siyuan Huang, Yikang Li

Multi-modal fusion is a fundamental task for the perception of an autonomous driving system, which has recently intrigued many researchers.

Autonomous Driving object-detection +3

A multi-domain virtual network embedding algorithm with delay prediction

no code implementations3 Feb 2022 Peiying Zhang, Xue Pang, Yongjing Ni, Haipeng Yao, Xin Li

Virtual network embedding (VNE) is an crucial part of network virtualization (NV), which aims to map the virtual networks (VNs) to a shared substrate network (SN).

Network Embedding

Machine learning prediction for mean motion resonance behaviour -- The planar case

no code implementations18 Jan 2022 Xin Li, Jian Li, Zhihong Jeff Xia, Nikolaos Georgakarakos

Most recently, machine learning has been used to study the dynamics of integrable Hamiltonian systems and the chaotic 3-body problem.

BIG-bench Machine Learning Numerical Integration

A Survey on Applications of Digital Human Avatars toward Virtual Co-presence

no code implementations11 Jan 2022 Matthew Korban, Xin Li

This paper investigates different approaches to build and use digital human avatars toward interactive Virtual Co-presence (VCP) environments.

Multi-Object Tracking Meets Moving UAV

no code implementations CVPR 2022 Shuai Liu, Xin Li, Huchuan Lu, You He

Multi-object tracking in unmanned aerial vehicle (UAV) videos is an important vision task and can be applied in a wide range of applications.

Multi-Object Tracking Object

SimSR: Simple Distance-based State Representation for Deep Reinforcement Learning

2 code implementations31 Dec 2021 Hongyu Zang, Xin Li, Mingzhong Wang

This work explores how to learn robust and generalizable state representation from image-based observations with deep reinforcement learning methods.

reinforcement-learning Reinforcement Learning (RL)

Robust Depth Completion with Uncertainty-Driven Loss Functions

no code implementations15 Dec 2021 Yufan Zhu, Weisheng Dong, Leida Li, Jinjian Wu, Xin Li, Guangming Shi

In this work, we introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion.

Depth Completion

An Informative Tracking Benchmark

1 code implementation13 Dec 2021 Xin Li, Qiao Liu, Wenjie Pei, Qiuhong Shen, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Along with the rapid progress of visual tracking, existing benchmarks become less informative due to redundancy of samples and weak discrimination between current trackers, making evaluations on all datasets extremely time-consuming.

Visual Tracking

Uncertainty-Driven Loss for Single Image Super-Resolution

no code implementations NeurIPS 2021 Qian Ning, Weisheng Dong, Xin Li, Jinjian Wu, Guangming Shi

Specifically, we introduce variance estimation characterizing the uncertainty on a pixel-by-pixel basis into SISR solutions so the targeted pixels in a high-resolution image (mean) and their corresponding uncertainty (variance) can be learned simultaneously.

Image Super-Resolution

Interactive Model with Structural Loss for Language-based Abductive Reasoning

no code implementations1 Dec 2021 Linhao Li, Ming Xu, Yongfeng Dong, Xin Li, Ao Wang

Therefore, we propose to group instead of ranking the hypotheses and design a structural loss called ``joint softmax focal loss'' in this paper.

Language Modelling Natural Language Inference

Document Layout Analysis with Aesthetic-Guided Image Augmentation

no code implementations27 Nov 2021 Tianlong Ma, Xingjiao Wu, Xin Li, Xiangcheng Du, Zhao Zhou, Liang Xue, Cheng Jin

To measure the proposed image layer modeling method, we propose a manually-labeled non-Manhattan layout fine-grained segmentation dataset named FPD.

Document Layout Analysis document understanding +2

Neural Collaborative Graph Machines for Table Structure Recognition

no code implementations CVPR 2022 Hao liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren

We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases.

Table Recognition

Confounder Identification-free Causal Visual Feature Learning

no code implementations26 Nov 2021 Xin Li, Zhizheng Zhang, Guoqiang Wei, Cuiling Lan, Wenjun Zeng, Xin Jin, Zhibo Chen

In this paper, we propose a novel Confounder Identification-free Causal Visual Feature Learning (CICF) method, which obviates the need for identifying confounders.

Domain Generalization Meta-Learning

Simple Contrastive Representation Adversarial Learning for NLP Tasks

no code implementations26 Nov 2021 Deshui Miao, JiaQi Zhang, WenBo Xie, Jian Song, Xin Li, Lijuan Jia, Ning Guo

In this paper, adversarial training is performed to generate challenging and harder learning adversarial examples over the embedding space of NLP as learning pairs.

Contrastive Learning Natural Language Understanding +4

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

1 code implementation CVPR 2022 Hao liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang, Bo Ren

For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks.

object-detection Object Detection +1

A Close Look at Few-shot Real Image Super-resolution from the Distortion Relation Perspective

no code implementations25 Nov 2021 Xin Li, Xin Jin, Jun Fu, Xiaoyuan Yu, Bei Tong, Zhibo Chen

Under this brand-new scenario, we propose Distortion Relation guided Transfer Learning (DRTL) for the few-shot RealSR by transferring the rich restoration knowledge from auxiliary distortions (i. e., synthetic distortions) to the target RealSR under the guidance of distortion relation.

Image Restoration Image Super-Resolution +4

Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples

1 code implementation22 Nov 2021 Linlin Liu, Xin Li, Ruidan He, Lidong Bing, Shafiq Joty, Luo Si

In this work, we explore methods to make better use of the multilingual annotation and language agnostic property of KG triples, and present novel knowledge based multilingual language models (KMLMs) trained directly on the knowledge triples.

Knowledge Graphs Language Modelling +9

Internationalizing AI: Evolution and Impact of Distance Factors

no code implementations10 Nov 2021 Xuli Tang, Xin Li, Feicheng Ma

A framework including 13 indicators to quantify the distance factors between countries from 5 perspectives (i. e., geographic distance, economic distance, cultural distance, academic distance, and industrial distance) is proposed.

Descriptive

Deep Models with Fusion Strategies for MVP Point Cloud Registration

1 code implementation18 Oct 2021 Lifa Zhu, Changwei Lin, Dongrui Liu, Xin Li, Francisco Gómez-Fernández

The main goal of point cloud registration in Multi-View Partial (MVP) Challenge 2021 is to estimate a rigid transformation to align a point cloud pair.

Point Cloud Registration

Probabilistic prediction of the heave motions of a semi-submersible by a deep learning problem model

1 code implementation9 Oct 2021 Xiaoxian Guo, Xiantao Zhang, Xinliang Tian, Wenyue Lu, Xin Li

In this study, we extend a deep learning (DL) model, which could predict the heave and surge motions of a floating semi-submersible 20 to 50 seconds ahead with good accuracy, to quantify its uncertainty of the predictive time series with the help of the dropout technique.

Motion Compensation motion prediction +2

Vector-quantized Image Modeling with Improved VQGAN

5 code implementations ICLR 2022 Jiahui Yu, Xin Li, Jing Yu Koh, Han Zhang, Ruoming Pang, James Qin, Alexander Ku, Yuanzhong Xu, Jason Baldridge, Yonghui Wu

Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining a Transformer to predict rasterized image tokens autoregressively.

Image Generation Representation Learning +1

Aspect Sentiment Quad Prediction as Paraphrase Generation

1 code implementation EMNLP 2021 Wenxuan Zhang, Yang Deng, Xin Li, Yifei Yuan, Lidong Bing, Wai Lam

Aspect-based sentiment analysis (ABSA) has been extensively studied in recent years, which typically involves four fundamental sentiment elements, including the aspect category, aspect term, opinion term, and sentiment polarity.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.