1 code implementation • 29 Jan 2024 • Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You
To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.
1 code implementation • NeurIPS 2023 • Zangwei Zheng, Xiaozhe Ren, Fuzhao Xue, Yang Luo, Xin Jiang, Yang You
By leveraging this information, we introduce an efficient sequence scheduling technique that groups queries with similar response lengths into micro-batches.
1 code implementation • Tiny Papers @ ICLR 2023 • Xiao Liu, Jian Zhang, Heng Zhang, Fuzhao Xue, Yang You
We evaluate our model on various dialogue understanding tasks including dialogue relation extraction, dialogue emotion recognition, and dialogue act classification.
Ranked #1 on Dialog Relation Extraction on DialogRE
1 code implementation • 30 Jan 2023 • Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You
However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty.
no code implementations • 21 May 2022 • Fuzhao Xue, Jianghai Chen, Aixin Sun, Xiaozhe Ren, Zangwei Zheng, Xiaoxin He, Yongming Chen, Xin Jiang, Yang You
In this paper, we revisit these conventional configurations.
Ranked #103 on Image Classification on ImageNet
1 code implementation • 13 Apr 2022 • Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qin, Youlong Cheng, Yang You
Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks.
1 code implementation • CVPR 2022 • Wangbo Zhao, Kai Wang, Xiangxiang Chu, Fuzhao Xue, Xinchao Wang, Yang You
Text-based video segmentation aims to segment the target object in a video based on a describing sentence.
Ranked #10 on Referring Expression Segmentation on A2D Sentences
Optical Flow Estimation Referring Expression Segmentation +4
no code implementations • 26 Jan 2022 • Fuzhao Xue, Xiaoxin He, Xiaozhe Ren, Yuxuan Lou, Yang You
Mixture-of-experts (MoE) is a powerful sparse architecture including multiple experts.
no code implementations • 1 Nov 2021 • Xiaoxin He, Fuzhao Xue, Xiaozhe Ren, Yang You
Deep learning have achieved promising results on a wide spectrum of AI applications.
no code implementations • 5 Sep 2021 • Yuxuan Lou, Fuzhao Xue, Zangwei Zheng, Yang You
Mixture-of-Experts (MoE), a conditional computation architecture, achieved promising performance by scaling local module (i. e. feed-forward network) of transformer.
no code implementations • 10 Aug 2021 • Andrew Koh, Fuzhao Xue, Eng Siong Chng
In this paper, we examine the use of Transfer Learning using Pretrained Audio Neural Networks (PANNs), and propose an architecture that is able to better leverage the acoustic features provided by PANNs for the Automated Audio Captioning Task.
1 code implementation • 25 Jul 2021 • Fuzhao Xue, Ziji Shi, Futao Wei, Yuxuan Lou, Yong liu, Yang You
To achieve better performance with fewer trainable parameters, recent methods are proposed to go shallower by parameter sharing or model compressing along with the depth.
Ranked #663 on Image Classification on ImageNet
no code implementations • 26 May 2021 • Shenggui Li, Fuzhao Xue, Chaitanya Baranwal, Yongbin Li, Yang You
That is, with sparse attention, our sequence parallelism enables us to train transformer with infinite long sequence.
no code implementations • 10 May 2021 • Jinjie Ni, Tom Young, Vlad Pandelea, Fuzhao Xue, Erik Cambria
To the best of our knowledge, this survey is the most comprehensive and up-to-date one at present for deep learning based dialogue systems, extensively covering the popular techniques.
1 code implementation • 27 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Jinjie Ni, Eng Siong Chng
Dialogue relation extraction (RE) is to predict the relation type of two entities mentioned in a dialogue.
Ranked #9 on Dialog Relation Extraction on DialogRE
1 code implementation • 12 Dec 2020 • Fuzhao Xue, Aixin Sun, Hao Zhang, Eng Siong Chng
Recent advances on RE task are from BERT-based sequence modeling and graph-based modeling of relationships among the tokens in the sequence.
Ranked #4 on Dialog Relation Extraction on DialogRE (F1c (v1) metric)
no code implementations • ICML 2020 • Hengguan Huang, Fuzhao Xue, Hao Wang, Ye Wang
Lying at the core of human intelligence, relational thinking is characterized by initially relying on innumerable unconscious percepts pertaining to relations between new sensory signals and prior knowledge, consequently becoming a recognizable concept or object through coupling and transformation of these percepts.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1