Search Results for author: Shiliang Zhang

Found 85 papers, 40 papers with code

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

1 code implementation • 29 Mar 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization.

Self-Supervised Learning speaker-diarization +3

711

Paper
Code

Decoupled Contrastive Learning for Long-Tailed Recognition

1 code implementation • 10 Mar 2024 • Shiyu Xuan, Shiliang Zhang

In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance.

Contrastive Learning Representation Learning

Paper
Code

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance

no code implementations • 6 Feb 2024 • Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai, Yushuai Li, TingWen Huang

We integrate the developed mechanism with the TA-LSTM, and calibrate the predicted heading angles to gain resistance against geomagnetic anomalies.

Paper
Add Code

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

2 code implementations • 23 Dec 2023 • Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

3,331

Paper
Code

Privacy-preserving transactive energy systems: Key topics and open research challenges

no code implementations • 17 Dec 2023 • Daniel Gerbi Duguma, Juliana Zhang, Meysam Aboutalebi, Shiliang Zhang, Catherine Banet, Cato Bjørkli, Chinmayi Baramashetru, Frank Eliassen, HUI ZHANG, Jonathan Muringani, Josef Noll, Knut Inge Fostervold, Lars Böcker, Lee Andrew Bygrave, Matin Bagherpour, Maunya Doroudi Moghadam, Olaf Owe, Poushali Sengupta, Roman Vitenberg, Sabita Maharjan, Thiago Garrett, Yushuai Li, Zhengyu Shan

This manuscript aims to formalize and conclude the discussions initiated during the PriTEM workshop 22-23 March 2023.

energy trading Management +1

Paper
Add Code

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

2 code implementations • 14 Nov 2023 • Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren Zhou

Recently, instruction-following audio-language models have received broad attention for audio interaction with humans.

Ranked #1 on Acoustic Scene Classification on TUT Acoustic Scenes 2017 (using extra training data)

Acoustic Scene Classification Audio captioning +4

3,331

Paper
Code

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

1 code implementation • 8 Nov 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach.

Paper
Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

Audio captioning Automatic Speech Recognition +11

276

Paper
Code

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

2 code implementations • 1 Oct 2023 • Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Referring Expression

Paper
Code

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

no code implementations • 26 Sep 2023 • Keyu An, Shiliang Zhang

Recently, self-attention-based transformers and conformers have been introduced as alternatives to RNNs for ASR acoustic modeling.

Chunking

Paper
Add Code

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations • 19 Sep 2023 • Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modelling +5

Paper
Add Code

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

no code implementations • 19 Sep 2023 • Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

Speaker diarization has gained considerable attention within speech processing research community.

speaker-diarization Speaker Diarization +1

Paper
Add Code

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

no code implementations • 14 Sep 2023 • Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

In spite of the excellent strides made by end-to-end (E2E) models in speech recognition in recent years, named entity recognition is still challenging but critical for semantic understanding.

Language Modelling named-entity-recognition +3

Paper
Add Code

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation • 14 Sep 2023 • Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +3

276

Paper
Code

MixBCT: Towards Self-Adapting Backward-Compatible Training

1 code implementation • 14 Aug 2023 • Yu Liang, Shiliang Zhang, YaoWei Wang, Sheng Xiao, Kenli Li, Xiaoyu Wang

As a solution, backward-compatible training can be employed to avoid the necessity of updating old retrieval datasets.

Face Recognition Image Retrieval +1

Paper
Code

Adaptive robust tracking control with active learning for linear systems with ellipsoidal bounded uncertainties

no code implementations • 7 Aug 2023 • Xuehui Ma, Shiliang Zhang, Yushuai Li, Fucai Qian, TingWen Huang

This paper is concerned with the robust tracking control of linear uncertain systems, whose unknown system parameters and disturbances are bounded within ellipsoidal sets.

Active Learning

Paper
Add Code

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

2 code implementations • 7 Aug 2023 • Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang

It possesses the advantages of AED-based model's accuracy, NAR model's efficiency, and explicit customization capacity of superior performance.

3,331

Paper
Code

Self-Distillation Network with Ensemble Prototypes: Learning Robust Speaker Representations without Supervision

1 code implementation • 5 Aug 2023 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang

It assigns representation of augmented views of utterances to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views.

Representation Learning Speaker Verification +1

711

Paper
Code

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

no code implementations • 23 May 2023 • Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

CASA-ASR: Context-Aware Speaker-Attributed ASR

no code implementations • 21 May 2023 • Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

no code implementations • 21 May 2023 • Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

For speech interaction, voice activity detection (VAD) is often used as a front-end.

Action Detection Activity Detection +4

Paper
Add Code

BAT: Boundary aware transducer for memory-efficient and low-latency ASR

1 code implementation • 19 May 2023 • Keyu An, Xian Shi, Shiliang Zhang

Recently, recurrent neural network transducer (RNN-T) gains increasing popularity due to its natural streaming capability as well as superior performance.

Ranked #9 on Speech Recognition on AISHELL-1

Automatic Speech Recognition Automatic Speech Recognition (ASR)

3,331

Paper
Code

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

1 code implementation • 18 May 2023 • Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications.

Ranked #1 on Speech Recognition on WenetSpeech (using extra training data)

Action Detection Activity Detection +2

3,331

Paper
Code

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

no code implementations • 18 May 2023 • Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan

Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies.

speech-recognition Speech Recognition

Paper
Add Code

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

1 code implementation • 8 Mar 2023 • JiaMing Wang, Zhihao Du, Shiliang Zhang

Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios.

Ranked #1 on Speaker Diarization on CALLHOME

speaker-diarization Speaker Diarization +1

3,331

Paper
Code

Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

1 code implementation • 29 Jan 2023 • Xian Shi, Yanni Chen, Shiliang Zhang, Zhijie Yan

Conventional ASR systems use frame-level phoneme posterior to conduct force-alignment~(FA) and provide timestamps, while end-to-end ASR systems especially AED based ones are short of such ability.

3,331

Paper
Code

Evolved Part Masking for Self-Supervised Learning

no code implementations • CVPR 2023 • Zhanzhou Feng, Shiliang Zhang

The accuracy of partitioned parts is on par with the capability of the pre-trained model, leading to evolved mask patterns at different training stages.

Image Classification Object +4

Paper
Add Code

3D Human Mesh Recovery with Sequentially Global Rotation Estimation

1 code implementation • ICCV 2023 • Dongkai Wang, Shiliang Zhang

This pipeline needs to transform each relative rotation matrix into a global rotation matrix to articulate the canonical mesh, and suffers from accumulated errors along the kinematics chain.

Human Mesh Recovery

Paper
Code

MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition

1 code implementation • 29 Nov 2022 • Xiaohuan Zhou, JiaMing Wang, Zeyu Cui, Shiliang Zhang, Zhijie Yan, Jingren Zhou, Chang Zhou

Therefore, we propose to introduce the phoneme modality into pre-training, which can help capture modality-invariant information between Mandarin speech and text.

Ranked #2 on Speech Recognition on AISHELL-1

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

2,324

Paper
Code

Deep Active Learning for Computer Vision: Past and Future

no code implementations • 27 Nov 2022 • Rinyoichi Takezoe, Xu Liu, Shunan Mao, Marco Tianyu Chen, Zhanpeng Feng, Shiliang Zhang, Xiaoyu Wang

As an important data selection schema, active learning emerges as the essential component when iterating an Artificial Intelligence (AI) model.

Active Learning

Paper
Add Code

ParCNetV2: Oversized Kernel with Enhanced Attention

1 code implementation • ICCV 2023 • Ruihan Xu, Haokui Zhang, Wenze Hu, Shiliang Zhang, Xiaoyu Wang

Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention.

Paper
Code

A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations • 1 Nov 2022 • Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai

Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

ALBench: A Framework for Evaluating Active Learning in Object Detection

1 code implementation • 27 Jul 2022 • Zhanpeng Feng, Shiliang Zhang, Rinyoichi Takezoe, Wenze Hu, Manmohan Chandraker, Li-Jia Li, Vijay K. Narayanan, Xiaoyu Wang

To facilitate the research in this field, this paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.

Active Learning Image Classification +4

556

Paper
Code

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

2 code implementations • 16 Jun 2022 • Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan

However, due to an independence assumption within the output tokens, performance of single-step NAR is inferior to that of AR models, especially with a large-scale corpus.

Language Modelling speech-recognition +1

6,067

Paper
Code

A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings

no code implementations • 31 Mar 2022 • Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie

Therefore, we propose the second approach, WD-SOT, to address alignment errors by introducing a word-level diarization model, which can get rid of such timestamp alignment dependency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

1 code implementation • 18 Mar 2022 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan

Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings.

Ranked #1 on Speaker Diarization on AliMeeting

Action Detection Activity Detection +2

3,331

Paper
Code

Extended vehicle energy dataset (eVED): an enhanced large-scale dataset for deep learning on vehicle trip energy consumption

2 code implementations • 16 Mar 2022 • Shiliang Zhang, Dyako Fatih, Fahmi Abdulqadir, Tobias Schwarz, Xuehui Ma

Compared with its original version, the extended VED (eVED) dataset is enhanced with accurate vehicle trip GPS coordinates, serving as a basis to associate the VED trip records with external information, e. g., road speed limit and intersections, from accessible map services to accumulate attributes that is essential in analyzing vehicle energy consumption.

Paper
Code

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

no code implementations • 16 Feb 2022 • Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao

Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV).

Paper
Add Code

Contextualize differential privacy in image database: a lightweight image differential privacy approach based on principle component analysis inverse

no code implementations • 16 Feb 2022 • Shiliang Zhang, Xuehui Ma, Hui Cao, Tengyuan Zhao, Yajie Yu, Zhuzhu Wang

To this end, we design a lightweight approach dedicating to privatizing image database as a whole and preserving the statistical semantics of the image database to an adjustable level, while making individual images' contribution to such statistics indistinguishable.

Attribute

Paper
Add Code

Contextual Instance Decoupling for Robust Multi-Person Pose Estimation

1 code implementation • CVPR 2022 • Dongkai Wang, Shiliang Zhang

Instead of relying on person bounding boxes to spatially differentiate persons, CID decouples persons in an image into multiple instance-aware feature maps.

Multi-Person Pose Estimation

Paper
Code

Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference

1 code implementation • NeurIPS 2021 • Dongkai Wang, Shiliang Zhang, Gang Hua

Instead of inferring individual keypoints, the Pose-level Inference Network (PINet) directly infers the complete pose cues for a person from his/her visible body parts.

Multi-Person Pose Estimation

Paper
Code

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

2 code implementations • 28 Nov 2021 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.

Action Detection Activity Detection +2

3,331

Paper
Code

An Energy Consumption Model for Electrical Vehicle Networks via Extended Federated-learning

no code implementations • 13 Nov 2021 • Shiliang Zhang

The two components collaborate to enhance learning robustness against data heterogeneities in networks.

Anomaly Detection Federated Learning

Paper
Add Code

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

1 code implementation • Findings (ACL) 2022 • Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing Li, Wei Wang, Xin Cao

In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.

Contrastive Learning Document Embedding +4

Paper
Code

BeamTransformer: Microphone Array-based Overlapping Speech Detection

no code implementations • 9 Sep 2021 • Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling.

Paper
Add Code

Enhancing Social Relation Inference with Concise Interaction Graph and Discriminative Scene Representation

no code implementations • 30 Jul 2021 • Xiaotian Yu, Hanling Yi, Yi Yu, Ling Xing, Shiliang Zhang, Xiaoyu Wang

There has been a recent surge of research interest in attacking the problem of social relation inference based on images.

Contrastive Learning domain classification +2

Paper
Add Code

MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking

2 code implementations • 22 Jul 2021 • Xiao Wang, Xiujun Shu, Shiliang Zhang, Bo Jiang, YaoWei Wang, Yonghong Tian, Feng Wu

The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.

Ranked #21 on Rgb-T Tracking on RGBT234

Rgb-T Tracking

Paper
Code

Large-Scale Spatio-Temporal Person Re-identification: Algorithms and Benchmark

2 code implementations • 31 May 2021 • Xiujun Shu, Xiao Wang, Xianghao Zang, Shiliang Zhang, Yuanqi Chen, Ge Li, Qi Tian

We also verified that models pre-trained on LaST can generalize well on existing datasets with short-term and cloth-changing scenarios.

Person Re-Identification

Paper
Code

Graph Consistency Based Mean-Teaching for Unsupervised Domain Adaptive Person Re-Identification

1 code implementation • 11 May 2021 • Xiaobin Liu, Shiliang Zhang

Specifically, given unlabeled training images, we apply teacher networks to extract corresponding features and further construct a teacher graph for each teacher network to describe the similarity relationships among training images.

Contrastive Learning Domain Adaptive Person Re-Identification +2

Paper
Code

AAformer: Auto-Aligned Transformer for Person Re-Identification

no code implementations • 2 Apr 2021 • Kuan Zhu, Haiyun Guo, Shiliang Zhang, YaoWei Wang, Gaopan Huang, Honglin Qiao, Jing Liu, Jinqiao Wang, Ming Tang

In this paper, we introduce an alignment scheme in Transformer architecture for the first time and propose the Auto-Aligned Transformer (AAformer) to automatically locate both the human parts and non-human ones at patch-level.

Human Parsing Image Classification +3

Paper
Add Code

Intra-Inter Camera Similarity for Unsupervised Person Re-Identification

1 code implementation • CVPR 2021 • Shiyu Xuan, Shiliang Zhang

The second stage considers the classification scores of each sample on different cameras as a new feature vector.

Ranked #1 on Person Re-Identification on SYSU-30k (using extra training data)

Pseudo Label Transfer Learning +1

Paper
Code

Viewpoint and Scale Consistency Reinforcement for UAV Vehicle Re-Identification

1 code implementation • IJCV 2021 • Shangzhi Teng, Shiliang Zhang, Qingming Huang, Nicu Sebe

Moreover, our method also achieves competitive performance compared with recent works on existing vehicle ReID datasets including VehicleID, VeRi-776 and VERI-Wild.

Vehicle Re-Identification

Paper
Code

Domain Adaptive Person Re-Identification via Coupling Optimization

1 code implementation • 6 Nov 2020 • Xiaobin Liu, Shiliang Zhang

Extensive experiments on three large-scale datasets, i. e., Market-1501, DukeMTMC-reID, and MSMT17, show that our coupling optimization outperforms state-of-the-art methods by a large margin.

Ranked #1 on Unsupervised Person Re-Identification on DukeMTMC-reID->MSMT17

Domain Adaptive Person Re-Identification Transfer Learning +1

Paper
Code

Joint Visual and Temporal Consistency for Unsupervised Domain Adaptive Person Re-Identification

no code implementations • ECCV 2020 • Jianing Li, Shiliang Zhang

This paper tackles this challenge through jointly enforcing visual and temporal consistency in the combination of a local one-hot classification and a global multi-class classification.

Classification Domain Adaptive Person Re-Identification +3

Paper
Add Code

Distillation Guided Residual Learning for Binary Convolutional Neural Networks

1 code implementation • 10 Jul 2020 • Jianming Ye, Shiliang Zhang, Jingdong Wang

We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.

Paper
Code

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

1 code implementation • 21 May 2020 • Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.

Sound Audio and Speech Processing

3,331

Paper
Code

Simplified Self-Attention for Transformer-based End-to-End Speech Recognition

no code implementations • 21 May 2020 • Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie

Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies.

speech-recognition Speech Recognition

Paper
Add Code

Unsupervised Person Re-identification via Multi-label Classification

no code implementations • CVPR 2020 • Dongkai Wang, Shiliang Zhang

Our label prediction and MMCL work iteratively and substantially boost the ReID performance.

Ranked #6 on Unsupervised Domain Adaptation on Duke to MSMT

Classification General Classification +4

Paper
Add Code

Robust Partial Matching for Person Search in the Wild

no code implementations • CVPR 2020 • Yingji Zhong, Xiaoyu Wang, Shiliang Zhang

This paper also contributes a Large-Scale dataset for Person Search in the wild (LSPS), which is by far the largest and the most challenging dataset for person search.

Human Detection Person Search +1

Paper
Add Code

Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

no code implementations • 3 Oct 2019 • Kai Fan, Jiayi Wang, Bo Li, Shiliang Zhang, Boxing Chen, Niyu Ge, Zhijie Yan

The performances of automatic speech recognition (ASR) systems are usually evaluated by the metric word error rate (WER) when the manually transcribed data are provided, which are, however, expensively available in the real scenario.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Global-Local Temporal Representations For Video Person Re-Identification

no code implementations • ICCV 2019 • Jianing Li, Jingdong Wang, Qi Tian, Wen Gao, Shiliang Zhang

The long-term relations are captured by a temporal self-attention model to alleviate the occlusions and noises in video sequences.

Metric Learning Re-Ranking +1

Paper
Add Code

Resolution-invariant Person Re-Identification

1 code implementation • 24 Jun 2019 • Shunan Mao, Shiliang Zhang, Ming Yang

RIFE adopts two feature extraction streams weighted by a dual-attention block to learn features for low and high resolution images, respectively.

Person Re-Identification Super-Resolution

Paper
Code

Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

no code implementations • 27 Mar 2019 • Shiliang Zhang, Ming Lei, Zhijie Yan

Results in a 20, 000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3. 41%, which results in 22. 9% and 53. 2% relative improvement compared to the baseline CTC-based systems decoded with and without language model respectively.

Language Modelling Machine Translation +4

Paper
Add Code

Bi-Directional Cascade Network for Perceptual Edge Detection

2 code implementations • CVPR 2019 • Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang

Exploiting multi-scale representations is critical to improve edge detection for objects at different scales.

Ranked #2 on Edge Detection on BRIND

Edge Detection

338

Paper
Code

Multi-scale 3D Convolution Network for Video Based Person Re-Identification

no code implementations • 19 Nov 2018 • Jianing Li, Shiliang Zhang, Tiejun Huang

A temporal stream in this network is constructed by inserting several Multi-scale 3D (M3D) convolution layers into a 2D CNN network.

Video-Based Person Re-Identification

Paper
Add Code

RAM: A Region-Aware Deep Model for Vehicle Re-Identification

no code implementations • 25 Jun 2018 • Xiaobin Liu, Shiliang Zhang, Qingming Huang, Wen Gao

Specifically, in addition to extracting global features, RAM also extracts features from a series of local regions.

Vehicle Re-Identification

Paper
Add Code

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

1 code implementation • 4 Mar 2018 • Shiliang Zhang, Ming Lei, Zhijie Yan, Li-Rong Dai

In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM.

Language Modelling speech-recognition +1

Paper
Code

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations • 26 Feb 2018 • Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +1

Paper
Add Code

LVreID: Person Re-Identification with Long Sequence Videos

no code implementations • 20 Dec 2017 • Jianing Li, Shiliang Zhang, Jingdong Wang, Wen Gao, Qi Tian

This paper mainly establishes a large-scale Long sequence Video database for person re-IDentification (LVreID).

Person Re-Identification

Paper
Add Code

Person Transfer GAN to Bridge Domain Gap for Person Re-Identification

25 code implementations • CVPR 2018 • Longhui Wei, Shiliang Zhang, Wen Gao, Qi Tian

Although the performance of person Re-Identification (ReID) has been significantly boosted, many challenging issues in real scenarios have not been fully investigated, e. g., the complex scenes and lighting variations, viewpoint and pose changes, and the large number of identities in a camera network.

Ranked #11 on Unsupervised Person Re-Identification on DukeMTMC-reID (Rank-10 metric)

Generative Adversarial Network Person Re-Identification +1

463

Paper
Code

Pose-driven Deep Convolutional Model for Person Re-identification

no code implementations • ICCV 2017 • Chi Su, Jianing Li, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian

Our deep architecture explicitly leverages the human part cues to alleviate the pose variations and learn robust feature representations from both the global image and different local parts.

Ranked #105 on Person Re-Identification on Market-1501

Person Re-Identification

Paper
Add Code

E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network

no code implementations • 18 Sep 2017 • Xiaobin Liu, Shiliang Zhang, Tiejun Huang, Qi Tian

To conquer these issues, we propose an End-to-End BoWs (E$^2$BoWs) model based on Deep Convolutional Neural Network (DCNN).

Image Retrieval Quantization +1

Paper
Add Code

GLAD: Global-Local-Alignment Descriptor for Pedestrian Retrieval

no code implementations • 13 Sep 2017 • Longhui Wei, Shiliang Zhang, Hantao Yao, Wen Gao, Qi Tian

Targeting to solve these problems, this work proposes a Global-Local-Alignment Descriptor (GLAD) and an efficient indexing and retrieval framework, respectively.

Ranked #93 on Person Re-Identification on Market-1501

Person Re-Identification Representation Learning +1

Paper
Add Code

One-Shot Fine-Grained Instance Retrieval

no code implementations • 4 Jul 2017 • Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

Aiming to conquer this issue, we propose a retrieval task named One-Shot Fine-Grained Instance Retrieval (OSFGIR).

Fine-Grained Visual Categorization Image Retrieval +1

Paper
Add Code

Deep Representation Learning with Part Loss for Person Re-Identification

no code implementations • 4 Jul 2017 • Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, Qi Tian

The representation learning risk is evaluated by the proposed part loss, which automatically generates several parts for an image, and computes the person classification loss on each part separately.

Ranked #97 on Person Re-Identification on Market-1501

Classification General Classification +2

Paper
Add Code

DR2-Net: Deep Residual Reconstruction Network for Image Compressive Sensing

1 code implementation • 19 Feb 2017 • Hantao Yao, Feng Dai, Dongming Zhang, Yike Ma, Shiliang Zhang, Yongdong Zhang, Qi Tian

Accordingly, DR$^{2}$-Net consists of two components, \emph{i. e.,} linear mapping network and residual network, respectively.

Compressive Sensing Image Reconstruction

Paper
Code

Neural Networks Models for Entity Discovery and Linking

no code implementations • 11 Nov 2016 • Dan Liu, Wei. Lin, Shiliang Zhang, Si Wei, Hui Jiang

This paper describes the USTC_NELSLIP systems submitted to the Trilingual Entity Detection and Linking (EDL) track in 2016 TAC Knowledge Base Population (KBP) contests.

Clustering Entity Linking +1

Paper
Add Code

Deep Attributes Driven Multi-Camera Person Re-identification

no code implementations • 11 May 2016 • Chi Su, Shiliang Zhang, Junliang Xing, Wen Gao, Qi Tian

And we propose a semi-supervised attribute learning framework which progressively boosts the accuracy of attributes only using a limited number of labeled data.

Attribute Metric Learning +1

Paper
Add Code

Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency

no code implementations • 28 Dec 2015 • Shiliang Zhang, Cong Liu, Hui Jiang, Si Wei, Li-Rong Dai, Yu Hu

In this paper, we propose a novel neural network structure, namely \emph{feedforward sequential memory networks (FSMN)}, to model long-term dependency in time series without using recurrent feedback.

Language Modelling speech-recognition +3

Paper
Add Code

Multi-Task Learning With Low Rank Attribute Embedding for Person Re-Identification

no code implementations • ICCV 2015 • Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry S. Davis, Wen Gao

Since attributes are generally correlated, we introduce a low rank attribute embedding into the MTL formulation to embed original binary attributes to a continuous attribute space, where incorrect and incomplete attributes are rectified and recovered to better describe people.

Attribute Multi-Task Learning +1

Paper
Add Code

Feedforward Sequential Memory Neural Networks without Recurrent Feedback

no code implementations • 9 Oct 2015 • ShiLiang Zhang, Hui Jiang, Si Wei, Li-Rong Dai

We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback.

Language Modelling

Paper
Add Code

The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models

no code implementations • IJCNLP 2015 • ShiLiang Zhang, Hui Jiang, MingBin Xu, JunFeng Hou, Li-Rong Dai

Information Retrieval Language Modelling +2

Paper
Add Code

A Fixed-Size Encoding Method for Variable-Length Sequences with its Application to Neural Network Language Models

1 code implementation • 6 May 2015 • Shiliang Zhang, Hui Jiang, MingBin Xu, JunFeng Hou, Li-Rong Dai

In this paper, we propose the new fixed-size ordinally-forgetting encoding (FOFE) method, which can almost uniquely encode any variable-length sequence of words into a fixed-size representation.

Paper
Code

Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks

no code implementations • 3 Feb 2015 • Shiliang Zhang, Hui Jiang

As a result, the HOPE framework can be used as a novel tool to probe why and how NNs work, more importantly, to learn NNs in either supervised or unsupervised ways.

Ranked #23 on Image Classification on MNIST

Image Classification speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.