Search Results for author: Yao Fu

Found 35 papers, 24 papers with code

Toward Inference-optimal Mixture-of-Expert Large Language Models

no code implementations • 3 Apr 2024 • Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of tokens?

Paper
Add Code

AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents

no code implementations • 13 Mar 2024 • Yao Fu, Dong-Ki Kim, Jaekyeom Kim, Sungryull Sohn, Lajanugen Logeswaran, Kyunghoon Bae, Honglak Lee

The primary limitation of large language models (LLMs) is their restricted understanding of the world.

Decision Making Language Modelling +1

Paper
Add Code

Data Engineering for Scaling Language Models to 128K Context

2 code implementations • 15 Feb 2024 • Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

4k Continual Pretraining

309

Paper
Code

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation • 29 Jan 2024 • Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

1,199

Paper
Code

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

no code implementations • 25 Jan 2024 • Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

This paper presents ServerlessLLM, a locality-enhanced serverless inference system for Large Language Models (LLMs).

Paper
Add Code

MoE-Infinity: Activation-Aware Expert Offloading for Efficient MoE Serving

1 code implementation • 25 Jan 2024 • Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, Mahesh Marina

This paper presents MoE-Infinity, a cost-efficient mixture-of-expert (MoE) serving system that realizes activation-aware expert offloading.

Paper
Code

Critical Data Size of Language Models from a Grokking Perspective

no code implementations • 19 Jan 2024 • Xuekai Zhu, Yao Fu, BoWen Zhou, Zhouhan Lin

We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics.

Language Modelling Memorization

Paper
Add Code

FiLM: Fill-in Language Models for Any-Order Generation

1 code implementation • 15 Oct 2023 • Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi

Language models have become the backbone of today's AI systems.

Language Modelling Large Language Model +1

Paper
Code

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

274

Paper
Code

Go Beyond Imagination: Maximizing Episodic Reachability with World Models

1 code implementation • 25 Aug 2023 • Yao Fu, Run Peng, Honglak Lee

Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks.

Efficient Exploration

Paper
Code

Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance

1 code implementation • 26 May 2023 • Yao Fu, Litu Ou, Mingyu Chen, Yuhao Wan, Hao Peng, Tushar Khot

As large language models (LLMs) are continuously being developed, their evaluation becomes increasingly important yet challenging.

2,363

Paper
Code

Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

1 code implementation • 17 May 2023 • Yao Fu, Hao Peng, Tushar Khot, Mirella Lapata

We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing.

In-Context Learning Language Modelling +1

184

Paper
Code

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

1 code implementation • NeurIPS 2023 • Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, Junxian He

We present C-Eval, the first comprehensive Chinese evaluation suite designed to assess advanced knowledge and reasoning abilities of foundation models in a Chinese context.

Multiple-choice

1,471

Paper
Code

Specializing Smaller Language Models towards Multi-Step Reasoning

2 code implementations • 30 Jan 2023 • Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

by paying the price of decreased generic ability, we can clearly lift up the scaling curve of models smaller than 10B towards a specialized multi-step math reasoning ability.

Math Model Selection

273

Paper
Code

TorchOpt: An Efficient Library for Differentiable Optimization

1 code implementation • 13 Nov 2022 • Jie Ren, Xidong Feng, Bo Liu, Xuehai Pan, Yao Fu, Luo Mai, Yaodong Yang

TorchOpt further provides a high-performance distributed execution runtime.

496

Paper
Code

Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

1 code implementation • 28 Oct 2022 • Yuling Gu, Yao Fu, Valentina Pyatkin, Ian Magnusson, Bhavana Dalvi Mishra, Peter Clark

We hypothesize that to perform this task well, the reader needs to mentally elaborate the scene being described to identify a sensible meaning of the language.

Paper
Code

Decomposed Prompting: A Modular Approach for Solving Complex Tasks

1 code implementation • 5 Oct 2022 • Tushar Khot, Harsh Trivedi, Matthew Finlayson, Yao Fu, Kyle Richardson, Peter Clark, Ashish Sabharwal

On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks.

Information Retrieval Retrieval

Paper
Code

Complexity-Based Prompting for Multi-Step Reasoning

no code implementations • 3 Oct 2022 • Yao Fu, Hao Peng, Ashish Sabharwal, Peter Clark, Tushar Khot

In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning.

Date Understanding GSM8K +2

Paper
Add Code

Latent Topology Induction for Understanding Contextualized Representations

1 code implementation • 3 Jun 2022 • Yao Fu, Mirella Lapata

With the induced network, we: (1).

Paper
Code

Few-shot Subgoal Planning with Language Models

no code implementations • NAACL 2022 • Lajanugen Logeswaran, Yao Fu, Moontae Lee, Honglak Lee

Pre-trained large language models have shown successful progress in many language understanding benchmarks.

Language Modelling Visual Reasoning

Paper
Add Code

Data-to-text Generation with Variational Sequential Planning

1 code implementation • 28 Feb 2022 • Ratish Puduppully, Yao Fu, Mirella Lapata

We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input.

Ranked #1 on Data-to-Text Generation on MLB Dataset (Relation Generation)

Data-to-Text Generation

Paper
Code

Experimental quantum advantage with quantum coupon collector

no code implementations • 15 Dec 2021 • Min-Gang Zhou, Xiao-Yu Cao, Yu-Shuo Lu, Yang Wang, Yu Bao, Zhao-Ying Jia, Yao Fu, Hua-Lei Yin, Zeng-Bing Chen

The information transmitted by the proposed game also broke the classical limit.

Paper
Add Code

Scaling Structured Inference with Randomization

1 code implementation • 7 Dec 2021 • Yao Fu, John P. Cunningham, Mirella Lapata

Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states.

Paper
Code

Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation

no code implementations • NeurIPS 2021 • Souvik Kundu, Qirui Sun, Yao Fu, Massoud Pedram, Peter Beerel

Knowledge distillation (KD) has recently been identified as a method that can unintentionally leak private information regarding the details of a teacher model to an unauthorized student.

Knowledge Distillation

Paper
Add Code

Discovering Latent Network Topology in Contextualized Representations with Randomized Dynamic Programming

no code implementations • 29 Sep 2021 • Yao Fu, Mirella Lapata

We use RDP to analyze the representation space of pretrained language models, discovering a large-scale latent network in a fully unsupervised way.

Paraphrase Generation

Paper
Add Code

Optimizing the Numbers of Queries and Replies in Federated Learning with Differential Privacy

1 code implementation • 5 Jul 2021 • Yipeng Zhou, Xuezheng Liu, Yao Fu, Di wu, Chao Li, Shui Yu

In this work, we study a crucial question which has been vastly overlooked by existing works: what are the optimal numbers of queries and replies in FL with DP so that the final model accuracy is maximized.

Federated Learning

Paper
Code

Noisy-Labeled NER with Confidence Estimation

1 code implementation • NAACL 2021 • Kun Liu, Yao Fu, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao

This work studies NER under a noisy labeled setting with calibrated confidence estimation.

named-entity-recognition Named Entity Recognition +1

Paper
Code

Probing BERT in Hyperbolic Spaces

1 code implementation • ICLR 2021 • Boli Chen, Yao Fu, Guangwei Xu, Pengjun Xie, Chuanqi Tan, Mosha Chen, Liping Jing

We introduce a Poincare probe, a structural probe projecting these embeddings into a Poincare subspace with explicitly defined hierarchies.

Word Embeddings

Paper
Code

Prototypical Representation Learning for Relation Extraction

1 code implementation • ICLR 2021 • Ning Ding, Xiaobin Wang, Yao Fu, Guangwei Xu, Rui Wang, Pengjun Xie, Ying Shen, Fei Huang, Hai-Tao Zheng, Rui Zhang

This approach allows us to learn meaningful, interpretable prototypes for the final classification.

Few-Shot Learning Relation +3

Paper
Code

On the Practicality of Differential Privacy in Federated Learning by Tuning Iteration Times

no code implementations • 11 Jan 2021 • Yao Fu, Yipeng Zhou, Di wu, Shui Yu, Yonggang Wen, Chao Li

Then, we theoretically derive: 1) the conditions for the DP based FedAvg to converge as the number of global iterations (GI) approaches infinity; 2) the method to set the number of local iterations (LI) to minimize the negative influence of DP noises.

Federated Learning

Paper
Add Code

Nested Named Entity Recognition with Partially-Observed TreeCRFs

1 code implementation • 15 Dec 2020 • Yao Fu, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang

With the TreeCRF we achieve a uniform way to jointly model the observed and the latent nodes.

Ranked #11 on Nested Named Entity Recognition on ACE 2005

Constituency Parsing named-entity-recognition +3

Paper
Code

Latent Template Induction with Gumbel-CRFs

1 code implementation • NeurIPS 2020 • Yao Fu, Chuanqi Tan, Bin Bi, Mosha Chen, Yansong Feng, Alexander M. Rush

Learning to control the structure of sentences is a challenging problem in text generation.

Data-to-Text Generation Paraphrase Generation +1

Paper
Code

Paraphrase Generation with Latent Bag of Words

2 code implementations • NeurIPS 2019 • Yao Fu, Yansong Feng, John P. Cunningham

Inspired by variational autoencoders with discrete latent structures, in this work, we propose a latent bag of words (BOW) model for paraphrase generation.

Paraphrase Generation Word Embeddings

389

Paper
Code

Rethinking Text Attribute Transfer: A Lexical Analysis

1 code implementation • WS 2019 • Yao Fu, Hao Zhou, Jiaze Chen, Lei LI

We apply this framework to existing datasets and models and show that: (1) the pivot words are strong features for the classification of sentence attributes; (2) to change the attribute of a sentence, many datasets only requires to change certain pivot words; (3) consequently, many transfer models only perform the lexical-level modification, while leaving higher-level sentence structures unchanged.

Attribute General Classification +3

Paper
Code

Natural Answer Generation with Heterogeneous Memory

no code implementations • NAACL 2018 • Yao Fu, Yansong Feng

Memory augmented encoder-decoder framework has achieved promising progress for natural language generation tasks.

Answer Generation Question Answering +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.