no code implementations • 3 Apr 2024 • Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang
Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens?
no code implementations • 13 Mar 2024 • Yao Fu, Dong-Ki Kim, Jaekyeom Kim, Sungryull Sohn, Lajanugen Logeswaran, Kyunghoon Bae, Honglak Lee
The primary limitation of large language models (LLMs) is their restricted understanding of the world.
2 code implementations • 15 Feb 2024 • Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng
We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.
1 code implementation • 29 Jan 2024 • Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You
To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.
no code implementations • 25 Jan 2024 • Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai
This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs).
1 code implementation • 25 Jan 2024 • Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina
This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.
no code implementations • 19 Jan 2024 • Xuekai Zhu, Yao Fu, BoWen Zhou, Zhouhan Lin
We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics.
1 code implementation • 15 Oct 2023 • Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi
Language models have become the backbone of today's AI systems.
1 code implementation • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
1 code implementation • 25 Aug 2023 • Yao Fu, Run Peng, Honglak Lee
Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks.
1 code implementation • 26 May 2023 • Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar Khot
As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging.
1 code implementation • 17 May 2023 • Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata
We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing.
1 code implementation • NeurIPS 2023 • Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He
We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context.
2 code implementations • 30 Jan 2023 • Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot
by paying the price of decreased generic ability, we can clearly lift up the scaling curve of models smaller than 10B towards a specialized multi-step math reasoning ability.
1 code implementation • 13 Nov 2022 • Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang
TorchOpt further provides a high-performance distributed execution runtime.
1 code implementation • 28 Oct 2022 • Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark
We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language.
1 code implementation • 5 Oct 2022 • Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal
On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks.
no code implementations • 3 Oct 2022 • Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot
In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.
1 code implementation • 3 Jun 2022 • Yao Fu, Mirella Lapata
With the induced network, we: (1).
no code implementations • NAACL 2022 • Lajanugen Logeswaran, Yao Fu, Moontae Lee, Honglak Lee
Pre-trained large language models have shown successful progress in many language understanding benchmarks.
1 code implementation • 28 Feb 2022 • Ratish Puduppully, Yao Fu, Mirella Lapata
We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input.
no code implementations • 15 Dec 2021 • Min-Gang Zhou, Xiao-Yu Cao, Yu-Shuo Lu, Yang Wang, Yu Bao, Zhao-Ying Jia, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen
The information transmitted by the proposed game also broke the classical limit.
1 code implementation • 7 Dec 2021 • Yao Fu, John P. Cunningham, Mirella Lapata
Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states.
no code implementations • NeurIPS 2021 • Souvik Kundu, Qirui Sun, Yao Fu, Massoud Pedram, Peter Beerel
Knowledge distillation (KD) has recently been identified as a method that can unintentionally leak private information regarding the details of a teacher model to an unauthorized student.
no code implementations • 29 Sep 2021 • Yao Fu, Mirella Lapata
We use RDP to analyze the representation space of pretrained language models, discovering a large-scale latent network in a fully unsupervised way.
1 code implementation • 5 Jul 2021 • Yipeng Zhou, Xuezheng Liu, Yao Fu, Di wu, Chao Li, Shui Yu
In this work, we study a crucial question which has been vastly overlooked by existing works: what are the optimal numbers of queries and replies in FL with DP so that the final model accuracy is maximized.
1 code implementation • NAACL 2021 • Kun Liu, Yao Fu, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao
This work studies NER under a noisy labeled setting with calibrated confidence estimation.
1 code implementation • ICLR 2021 • Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing
We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies.
1 code implementation • ICLR 2021 • Ning Ding, Xiaobin Wang, Yao Fu, Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, Rui Zhang
This approach allows us to learn meaningful, interpretable prototypes for the final classification.
no code implementations • 11 Jan 2021 • Yao Fu, Yipeng Zhou, Di wu, Shui Yu, Yonggang Wen, Chao Li
Then, we theoretically derive: 1) the conditions for the DP based FedAvg to converge as the number of global iterations (GI) approaches infinity; 2) the method to set the number of local iterations (LI) to minimize the negative influence of DP noises.
1 code implementation • 15 Dec 2020 • Yao Fu, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang
With the TreeCRF we achieve a uniform way to jointly model the observed and the latent nodes.
Ranked #11 on Nested Named Entity Recognition on ACE 2005
1 code implementation • NeurIPS 2020 • Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen, Yansong Feng, Alexander M. Rush
Learning to control the structure of sentences is a challenging problem in text generation.
2 code implementations • NeurIPS 2019 • Yao Fu, Yansong Feng, John P. Cunningham
Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation.
1 code implementation • WS 2019 • Yao Fu, Hao Zhou, Jiaze Chen, Lei LI
We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification, while leaving higher-level sentence structures unchanged.
no code implementations • NAACL 2018 • Yao Fu, Yansong Feng
Memory augmented encoder-decoder framework has achieved promising progress for natural language generation tasks.