Search Results for author: Jian Xue

Found 18 papers, 5 papers with code

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

no code implementations23 Oct 2023 Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Naoyuki Kanda, Jinyu Li, Yashesh Gaur

The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

DiariST: Streaming Speech Translation with Speaker Diarization

1 code implementation14 Sep 2023 Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion.

speaker-diarization Speaker Diarization +3

FoodSAM: Any Food Segmentation

1 code implementation11 Aug 2023 Xing Lan, Jiayi Lyu, Hanyu Jiang, Kun Dong, Zehai Niu, Yi Zhang, Jian Xue

Remarkably, this pioneering framework stands as the first-ever work to achieve instance, panoptic, and promptable segmentation on food images.

 Ranked #1 on Semantic Segmentation on FoodSeg103 (using extra training data)

Image Segmentation Instance Segmentation +2

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

no code implementations30 Jul 2023 Eric Sun, Jinyu Li, Jian Xue, Yifan Gong

When mixing 20, 000 hours augmented speech data generated by our method with 12, 500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8. 7% relative word error rate reduction.

Automatic Speech Recognition Data Augmentation +2

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

no code implementations7 Jul 2023 Sara Papi, Peidong Wang, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations1 Mar 2023 Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras

no code implementations12 Dec 2022 Jinbao Wang, Ke Lu, Jian Xue

This paper proposes a novel application system for the generation of three-dimensional (3D) character animation driven by markerless human body motion capturing.

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

no code implementations4 Nov 2022 Jian Xue, Peidong Wang, Jinyu Li, Eric Sun

In this paper, we introduce our work of building a Streaming Multilingual Speech Model (SM2), which can transcribe or translate multiple spoken languages into texts of the target language.

Machine Translation speech-recognition +2

G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection

1 code implementation24 May 2022 Liping Hou, Ke Lu, Xue Yang, Yuqiu Li, Jian Xue

To go further, in this paper, we propose a unified Gaussian representation called G-Rep to construct Gaussian distributions for OBB, QBB, and PointSet, which achieves a unified solution to various representations and problems.

Object object-detection +3

FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

no code implementations4 Nov 2021 Wei Gan, Jian Xue, Ke Lu, Yanfu Yan, Pengcheng Gao, Jiayi Lyu

Extended FEAFA (FEAFA+) includes 150 video sequences from FEAFA and DISFA, with a total of 230, 184 frames being manually annotated on floating-point intensity value of 24 redefined AUs using the Expression Quantitative Tool.

On Addressing Practical Challenges for RNN-Transducer

no code implementations27 Apr 2021 Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

speech-recognition Speech Recognition

HIH: Towards More Accurate Face Alignment via Heatmap in Heatmap

1 code implementation7 Apr 2021 Xing Lan, Qinghao Hu, Qiang Chen, Jian Xue, Jian Cheng

In particular, our HIH reaches 4. 08 NME (Normalized Mean Error) on WFLW, and 3. 21 on COFW, which exceeds previous methods by a significant margin.

Face Alignment regression

FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

no code implementations2 Apr 2019 Yanfu Yan, Ke Lu, Jian Xue, Pengcheng Gao, Jiayi Lyu

To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation.

3D Face Reconstruction regression

Cannot find the paper you are looking for? You can Submit a new open access paper.