Search Results for author: Minwei Feng

Found 11 papers, 3 papers with code

CTC Blank Triggered Dynamic Layer-Skipping for Efficient CTC-based Speech Recognition

no code implementations • 4 Jan 2024 • JunFeng Hou, Peiyao Wang, Jincheng Zhang, Meng Yang, Minwei Feng, Jingcheng Yin

Deploying end-to-end speech recognition models with limited computing resources remains challenging, despite their impressive performance.

Knowledge Distillation speech-recognition +1

Paper
Add Code

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

1 code implementation • 24 May 2023 • Ziwei He, Meng Yang, Minwei Feng, Jingcheng Yin, Xinbing Wang, Jingwen Leng, Zhouhan Lin

Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models.

Ranked #1 on Open-Domain Question Answering on ELI5

Abstractive Text Summarization Document Summarization +2

Paper
Code

A Structured Self-attentive Sentence Embedding

52 code implementations • 9 Mar 2017 • Zhouhan Lin, Minwei Feng, Cicero Nogueira dos santos, Mo Yu, Bing Xiang, Bo-Wen Zhou, Yoshua Bengio

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.

General Classification Natural Language Inference +5

8,474

Paper
Code

Local System Voting Feature for Machine Translation System Combination

no code implementations • WS 2015 • Markus Freitag, Jan-Thorsten Peter, Stephan Peitz, Minwei Feng, Hermann Ney

In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network.

Machine Translation Sentence +1

Paper
Add Code

GaDei: On Scale-up Training As A Service For Deep Learning

no code implementations • 18 Nov 2016 • Wei Zhang, Minwei Feng, Yunhui Zheng, Yufei Ren, Yandong Wang, Ji Liu, Peng Liu, Bing Xiang, Li Zhang, Bo-Wen Zhou, Fei Wang

By evaluating the NLC workloads, we show that only the conservative hyper-parameter setup (e. g., small mini-batch size and small learning rate) can guarantee acceptable model accuracy for a wide range of customers.