Search Results for author: Fuzhao Xue

Found 17 papers, 9 papers with code

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation • 29 Jan 2024 • Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

1,203

Paper
Code

Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline

1 code implementation • NeurIPS 2023 • Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You

By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches.

Quantization Scheduling

Paper
Code

Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention

1 code implementation • Tiny Papers @ ICLR 2023 • Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You

We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification.

Ranked #1 on Dialog Relation Extraction on DialogRE

Dialogue Act Classification Dialogue Understanding +2

Paper
Code

Adaptive Computation with Elastic Input Sequence

1 code implementation • 30 Jan 2023 • Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.

Inductive Bias

2,997

Paper
Code

A Study on Transformer Configuration and Training Objective

no code implementations • 21 May 2022 • Fuzhao Xue, Jianghai Chen, Aixin Sun, Xiaozhe Ren, Zangwei Zheng, Xiaoxin He, Yongming Chen, Xin Jiang, Yang You

In this paper, we revisit these conventional configurations.

Ranked #103 on Image Classification on ImageNet

Image Classification

Paper
Add Code

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

1 code implementation • 13 Apr 2022 • Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qin, Youlong Cheng, Yang You

Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks.

Click-Through Rate Prediction

155

Paper
Code

Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation

1 code implementation • CVPR 2022 • Wangbo Zhao, Kai Wang, Xiangxiang Chu, Fuzhao Xue, Xinchao Wang, Yang You

Text-based video segmentation aims to segment the target object in a video based on a describing sentence.

Ranked #10 on Referring Expression Segmentation on A2D Sentences

Optical Flow Estimation Referring Expression Segmentation +4

Paper
Code

One Student Knows All Experts Know: From Sparse to Dense

no code implementations • 26 Jan 2022 • Fuzhao Xue, Xiaoxin He, Xiaozhe Ren, Yuxuan Lou, Yang You

Mixture-of-experts (MoE) is a powerful sparse architecture including multiple experts.

Knowledge Distillation

Paper
Add Code

Large-Scale Deep Learning Optimizations: A Comprehensive Survey

no code implementations • 1 Nov 2021 • Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You

Deep learning have achieved promising results on a wide spectrum of AI applications.

Paper
Add Code

Cross-token Modeling with Conditional Computation

no code implementations • 5 Sep 2021 • Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You

Mixture-of-Experts (MoE), a conditional computation architecture, achieved promising performance by scaling local module (i. e. feed-forward network) of transformer.

Computational Efficiency Image Classification

Paper
Add Code

Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization

no code implementations • 10 Aug 2021 • Andrew Koh, Fuzhao Xue, Eng Siong Chng

In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs), and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task.

Audio captioning Transfer Learning

Paper
Add Code

Go Wider Instead of Deeper

1 code implementation • 25 Jul 2021 • Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong liu, Yang You

To achieve better performance with fewer trainable parameters, recent methods are proposed to go shallower by parameter sharing or model compressing along with the depth.

Ranked #663 on Image Classification on ImageNet

Image Classification

Paper
Code

Sequence Parallelism: Long Sequence Training from System Perspective

no code implementations • 26 May 2021 • Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You

That is, with sparse attention, our sequence parallelism enables us to train transformer with infinite long sequence.

Paper
Add Code

Recent Advances in Deep Learning Based Dialogue Systems: A Systematic Survey

no code implementations • 10 May 2021 • Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, Erik Cambria

To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present for deep learning based dialogue systems, extensively covering the popular techniques.

Information Retrieval Question Answering

Paper
Add Code

An Embarrassingly Simple Model for Dialogue Relation Extraction

1 code implementation • 27 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Jinjie Ni, Eng Siong Chng

Dialogue relation extraction (RE) is to predict the relation type of two entities mentioned in a dialogue.

Ranked #9 on Dialog Relation Extraction on DialogRE

Dialog Relation Extraction Relation +1

Paper
Code

GDPNet: Refining Latent Multi-View Graph for Relation Extraction

1 code implementation • 12 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng

Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence.

Ranked #4 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)

Dialog Relation Extraction Dynamic Time Warping +2

Paper
Code

Deep Graph Random Process for Relational-Thinking-Based Speech Recognition

no code implementations • ICML 2020 • Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang

Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.