Search Results for author: Bin Xiao

Found 64 papers, 35 papers with code

Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search

1 code implementation15 Mar 2024 Hongyuan Yu, Cheng Wan, Mengchen Liu, Dongdong Chen, Bin Xiao, Xiyang Dai

Manually replacing convolution layers with multi-head self-attention is non-trivial due to the costly overhead in memory to maintain high resolution.

Autonomous Driving Image Segmentation +2

Rethinking Detection Based Table Structure Recognition for Visually Rich Document Images

no code implementations1 Dec 2023 Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

However, existing detection-based models usually cannot perform as well as other types of solutions regarding cell-level TSR metrics, such as TEDS, and the underlying reasons limiting the performance of these models on the TSR task are also not well-explored.

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

no code implementations10 Nov 2023 Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

Multi-Task Learning object-detection +1

Detecting Generated Images by Real Images Only

no code implementations2 Nov 2023 Xiuli Bi, Bo Liu, Fan Yang, Bin Xiao, Weisheng Li, Gao Huang, Pamela C. Cosman

This paper approaches the generated image detection problem from a new perspective: Start from real images.

VFedMH: Vertical Federated Learning for Training Multiple Heterogeneous Models

no code implementations20 Oct 2023 Shuo Wang, Keke Gai, Jing Yu, Liehuang Zhu, Kim-Kwang Raymond Choo, Bin Xiao

Then the passive party, who owns only features of the sample, injects the blinding factor into the local embedding and sends it to the active party.

Vertical Federated Learning

A Survey of Robustness and Safety of 2D and 3D Deep Learning Models Against Adversarial Attacks

no code implementations1 Oct 2023 YanJie Li, Bin Xie, Songtao Guo, Yuanyuan Yang, Bin Xiao

Lots of papers have emerged to investigate the robustness and safety of deep learning models against adversarial attacks.

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance

1 code implementation ICCV 2023 Kan Wu, Houwen Peng, Zhenghong Zhou, Bin Xiao, Mengchen Liu, Lu Yuan, Hong Xuan, Michael Valenzuela, Xi, Chen, Xinggang Wang, Hongyang Chao, Han Hu

In this paper, we propose a novel cross-modal distillation method, called TinyCLIP, for large-scale language-image pre-trained models.

Generating Transferable and Stealthy Adversarial Patch via Attention-guided Adversarial Inpainting

no code implementations10 Aug 2023 YanJie Li, Mingxing Duan, Xuelong Dai, Bin Xiao

In the first stage, we extract multi-scale style embeddings by a pyramid-like network and identity embeddings by a pretrained FR model and propose a novel Attention-guided Adaptive Instance Normalization layer (AAIN) to merge them via background-patch cross-attention maps.

Face Recognition

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

1 code implementation24 Jul 2023 Xuelong Dai, Kaisheng Liang, Bin Xiao

Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques.

Adversarial Defense

Fast Continual Multi-View Clustering with Incomplete Views

no code implementations4 Jun 2023 Xinhang Wan, Bin Xiao, Xinwang Liu, Jiyuan Liu, Weixuan Liang, En Zhu

Such an incomplete continual data problem (ICDP) in MVC is tough to solve since incomplete information with continual data increases the difficulty of extracting consistent and complementary knowledge among views.

Clustering

Table Detection for Visually Rich Document Images

1 code implementation30 May 2023 Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

Table Detection (TD) is a fundamental task to enable visually rich document understanding, which requires the model to extract information without information loss.

document understanding object-detection +2

i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

no code implementations21 May 2023 ZiYi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities.

Revisiting Table Detection Datasets for Visually Rich Documents

no code implementations4 May 2023 Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

Moreover, to enrich the data sources, we propose a new ICT-TD dataset using the PDF files of Information and Communication Technologies (ICT) commodities, a different domain containing unique samples that hardly appear in open datasets.

document understanding object-detection +2

StyLess: Boosting the Transferability of Adversarial Examples

1 code implementation CVPR 2023 Kaisheng Liang, Bin Xiao

Our method can prevent adversarial examples from using non-robust style features and help generate transferable perturbations.

SARF: Aliasing Relation Assisted Self-Supervised Learning for Few-shot Relation Reasoning

no code implementations20 Apr 2023 Lingyuan Meng, Ke Liang, Bin Xiao, Sihang Zhou, Yue Liu, Meng Liu, Xihong Yang, Xinwang Liu

Moreover, most of the existing methods ignore leveraging the beneficial information from aliasing relations (AR), i. e., data-rich relations with similar contextual semantics to the target data-poor relation.

Knowledge Graphs Relation +1

DAA: A Delta Age AdaIN operation for age estimation via binary code transformer

no code implementations CVPR 2023 Ping Chen, Xingpeng Zhang, Ye Li, Ju Tao, Bin Xiao, Bing Wang, Zongjie Jiang

Inspired by the transfer learning, we designed the Delta Age AdaIN (DAA) operation to obtain the feature difference with each age, which obtains the style map of each age through the learned values representing the mean and standard deviation.

Age Estimation Transfer Learning

Revisiting Initializing Then Refining: An Incomplete and Missing Graph Imputation Network

no code implementations15 Feb 2023 Wenxuan Tu, Bin Xiao, Xinwang Liu, Sihang Zhou, Zhiping Cai, Jieren Cheng

With the development of various applications, such as social networks and knowledge graphs, graph data has been ubiquitous in the real world.

Attribute Imputation +1

MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation

1 code implementation CVPR 2023 Yongchao Wang, Bin Xiao, Xiuli Bi, Weisheng Li, Xinbo Gao

Inspired by the plain contrast idea, MCF introduces two different subnets to explore and utilize the discrepancies between subnets to correct cognitive bias of the model.

Image Segmentation Pseudo Label +3

DLBD: A Self-Supervised Direct-Learned Binary Descriptor

1 code implementation CVPR 2023 Bin Xiao, Yang Hu, Bo Liu, Xiuli Bi, Weisheng Li, Xinbo Gao

Since their binarization processes are not a component of the network, the learning-based binary descriptor cannot fully utilize the advances of deep learning.

Binarization Image Retrieval +1

Efficient Information Sharing in ICT Supply Chain Social Network via Table Structure Recognition

no code implementations3 Nov 2022 Bin Xiao, Yakup Akkaya, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

Table Structure Recognition (TSR) aims to represent tables with complex structures in a machine-interpretable format so that the tabular data can be processed automatically.

Management object-detection +1

Semantic Cross Attention for Few-shot Learning

1 code implementation12 Oct 2022 Bin Xiao, Chien-Liang Liu, Wen-Hoar Hsaio

Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal.

Few-Shot Learning Image Classification +1

Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach

no code implementations11 Aug 2022 Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

To transform the tabular data in electronic documents into a machine-interpretable format and provide layout and semantic information for information extraction and interpretation, we define a Table Structure Recognition (TSR) task and a Table Cell Type Classification (CTC) task.

Attribute

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

1 code implementation26 Jul 2022 Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by mapping multiple modalities into a shared embedding space.

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

2 code implementations21 Jul 2022 Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

It achieves a top-1 accuracy of 84. 8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4. 2 times fewer parameters.

Image Classification Knowledge Distillation

Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule Augmentation and Detection

no code implementations19 Jul 2022 Zhenrong Shen, Xi Ouyang, Bin Xiao, Jie-Zhi Cheng, Qian Wang, Dinggang Shen

Moreover, we propose to synthesize nodule CXR images by controlling the disentangled nodule attributes for data augmentation, in order to better compensate for the nodules that are easily missed in the detection task.

Attribute Data Augmentation +2

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

no code implementations22 Apr 2022 Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data.

Question Answering Visual Commonsense Reasoning +2

DaViT: Dual Attention Vision Transformers

3 code implementations7 Apr 2022 Mingyu Ding, Bin Xiao, Noel Codella, Ping Luo, Jingdong Wang, Lu Yuan

We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention.

Computational Efficiency Image Classification +4

Unified Contrastive Learning in Image-Text-Label Space

1 code implementation CVPR 2022 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.

Contrastive Learning Image Classification +2

Table Structure Recognition with Conditional Attention

no code implementations8 Mar 2022 Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

Table Structure Recognition (TSR) problem aims to recognize the structure of a table and transform the unstructured tables into a structured and machine-readable format so that the tabular data can be further analysed by the down-stream tasks, such as semantic modeling and information retrieval.

Information Retrieval Retrieval

CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks

no code implementations15 Jan 2022 Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Jianwei Yang, Xiyang Dai, Bin Xiao, Haoxuan You, Shih-Fu Chang, Lu Yuan

Experiments demonstrate that our proposed CLIP-TD leads to exceptional gains in the low-shot (up to 51. 9%) and domain-shifted (up to 71. 3%) conditions of VCR, while simultaneously improving performance under standard fully-supervised conditions (up to 2%), achieving state-of-art performance on VCR compared to other single models that are pretrained with image-text data only.

Question Answering Visual Commonsense Reasoning +2

Focal Attention for Long-Range Interactions in Vision Transformers

1 code implementation NeurIPS 2021 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal attention, we propose a new variant of Vision Transformer models, called Focal Transformers, which achieve superior performance over the state-of-the-art (SoTA) Vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification object-detection +2

Florence: A New Foundation Model for Computer Vision

1 code implementation22 Nov 2021 Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Action Classification Action Recognition In Videos +12

Generating Unrestricted 3D Adversarial Point Clouds

1 code implementation17 Nov 2021 Xuelong Dai, YanJie Li, Hua Dai, Bin Xiao

The unrestricted adversarial attack loss is incorporated in the special adversarial training of GAN, which enables the generator to generate the adversarial examples to spoof the target network.

Adversarial Attack Generative Adversarial Network

MA-CLIP: Towards Modality-Agnostic Contrastive Language-Image Pre-training

no code implementations29 Sep 2021 Haoxuan You, Luowei Zhou, Bin Xiao, Noel C Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Large-scale multimodal contrastive pretraining has demonstrated great utility to support high performance in a range of downstream tasks by mapping multiple modalities into a shared embedding space.

Focal Self-attention for Local-Global Interactions in Vision Transformers

3 code implementations1 Jul 2021 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan, Jianfeng Gao

With focal self-attention, we propose a new variant of Vision Transformer models, called Focal Transformer, which achieves superior performance over the state-of-the-art vision Transformers on a range of public image classification and object detection benchmarks.

Image Classification Instance Segmentation +3

Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks

1 code implementation ICML Workshop AML 2021 Fan Liu, Shuyu Zhao, Xuelong Dai, Bin Xiao

Although adversarial training (AT) methods such as Adversarial Query (AQ) can improve the adversarially robust performance of meta-learning models, AT is still computationally expensive training.

Adversarial Robustness Classification +1

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

2 code implementations CVPR 2021 Zigang Geng, Ke Sun, Bin Xiao, Zhaoxiang Zhang, Jingdong Wang

Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.

Keypoint Detection

CvT: Introducing Convolutions to Vision Transformers

14 code implementations ICCV 2021 Haiping Wu, Bin Xiao, Noel Codella, Mengchen Liu, Xiyang Dai, Lu Yuan, Lei Zhang

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

Ranked #3 on Image Classification on Flowers-102 (using extra training data)

Image Classification

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

3 code implementations ICCV 2021 Pengchuan Zhang, Xiyang Dai, Jianwei Yang, Bin Xiao, Lu Yuan, Lei Zhang, Jianfeng Gao

This paper presents a new Vision Transformer (ViT) architecture Multi-Scale Vision Longformer, which significantly enhances the ViT of \cite{dosovitskiy2020image} for encoding high-resolution images using two techniques.

Image Classification Instance Segmentation +2

Consistent Instance Classification for Unsupervised Representation Learning

no code implementations1 Jan 2021 Depu Meng, Zigang Geng, Zhirong Wu, Bin Xiao, Houqiang Li, Jingdong Wang

The proposed consistent instance classification (ConIC) approach simultaneously optimizes the classification loss and an additional consistency loss explicitly penalizing the feature dissimilarity between the augmented views from the same instance.

Classification General Classification +1

Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization

no code implementations ICCV 2021 Xiuli Bi, Zhipeng Zhang, Bin Xiao

For detecting the tampered regions, a forgery localization generator GM is proposed based on a multi-decoder-single-task strategy.

Style Transfer

DTMNet: A Discrete Tchebichef Moments-Based Deep Neural Network for Multi-Focus Image Fusion

no code implementations ICCV 2021 Bin Xiao, Haifeng Wu, Xiuli Bi

The proposed DTMNet is an end-to-end deep neural network with only one convolutional layer and three fully connected layers.

Computational Efficiency

Color-related Local Binary Pattern: A Learned Local Descriptor for Color Image Recognition

no code implementations11 Dec 2020 Bin Xiao, Tao Geng, Xiuli Bi, Weisheng Li

In this paper, a color-related local binary pattern (cLBP) which learns the dominant patterns from the decoded LBP is proposed for color images recognition.

D-Unet: A Dual-encoder U-Net for Image Splicing Forgery Detection and Localization

no code implementations3 Dec 2020 Bo Liu, Ranglei Wu, Xiuli Bi, Bin Xiao, Weisheng Li, Guoyin Wang, Xinbo Gao

The unfixed encoder autonomously learns the image fingerprints that differentiate between the tampered and non-tampered regions, whereas the fixed encoder intentionally provides the direction information that assists the learning and detection of the network.

Binary Classification

Proxy Network for Few Shot Learning

1 code implementation9 Sep 2020 Bin Xiao, Chien-Liang Liu, Wen-Hoar Hsaio

We conclude that the success of metric-learning based approaches lies in the data embedding, the representative of each class, and the distance metric.

Few-Shot Learning Metric Learning

3D Human Pose Estimation via Explicit Compositional Depth Maps

no code implementations AAAI 2020 Haiping Wu, Bin Xiao

n this work, we tackle the problem of estimating 3D human pose in camera space from a monocular image.

3D Human Pose Estimation

RRU-Net: The Ringed Residual U-Net for Image Splicing Forgery Detection

1 code implementation cvpr 2019 workshop 2019 Xiuli Bi, Yang Wei, Bin Xiao, Weisheng Li

The core idea of the RRU-Net is to strengthen the learning way of CNN, which is inspired by the recall and the consolidation mechanism of the human brain and implemented by the propagation and the feedback process of the residual in CNN.

Attribute

Deep High-Resolution Representation Learning for Visual Recognition

42 code implementations20 Aug 2019 Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection.

 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

Dichotomous Image Segmentation Face Alignment +7

Deep High-Resolution Representation Learning for Human Pose Estimation

39 code implementations CVPR 2019 Ke Sun, Bin Xiao, Dong Liu, Jingdong Wang

We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel.

2D Human Pose Estimation Instance Segmentation +6

Interleaved Group Convolutions

no code implementations ICCV 2017 Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.

Interleaved Group Convolutions for Deep Neural Networks

2 code implementations10 Jul 2017 Ting Zhang, Guo-Jun Qi, Bin Xiao, Jingdong Wang

The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution.

Cannot find the paper you are looking for? You can Submit a new open access paper.