Search Results for author: Haokun Liu

Found 22 papers, 11 papers with code

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

no code implementations8 Apr 2024 Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios.

Learning to Route Among Specialized Experts for Zero-Shot Generalization

1 code implementation8 Feb 2024 Mohammed Muqeeth, Haokun Liu, Yufan Liu, Colin Raffel

Unlike past methods that learn to route among specialized models, PHATGOOSE explores the possibility that zero-shot generalization will be improved if different experts can be adaptively chosen for each token and at each layer in the model.

Zero-shot Generalization

LLM-Based Human-Robot Collaboration Framework for Manipulation Tasks

no code implementations29 Aug 2023 Haokun Liu, Yaonan Zhu, Kenji Kato, Izumi Kondo, Tadayoshi Aoyama, Yasuhisa Hasegawa

This paper presents a novel approach to enhance autonomous robotic manipulation using the Large Language Model (LLM) for logical inference, converting high-level language commands into sequences of executable motion functions.

Language Modelling Large Language Model

Soft Merging of Experts with Adaptive Routing

no code implementations6 Jun 2023 Mohammed Muqeeth, Haokun Liu, Colin Raffel

To address this issue, we introduce Soft Merging of Experts with Adaptive Routing (SMEAR), which avoids discrete routing by using a single "merged" expert constructed via a weighted average of all of the experts' parameters.

Beyond Ensemble Averages: Leveraging Climate Model Ensembles for Subseasonal Forecasting

1 code implementation29 Nov 2022 Elena Orlova, Haokun Liu, Raphael Rossellini, Benjamin Cash, Rebecca Willett

This study explores an application of machine learning (ML) models as post-processing tools for subseasonal forecasting.

Feature Importance regression

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

2 code implementations11 May 2022 Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.

Few-Shot Text Classification In-Context Learning

Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers

no code implementations EMNLP (BlackboxNLP) 2021 Jason Phang, Haokun Liu, Samuel R. Bowman

Despite the success of fine-tuning pretrained language encoders like BERT for downstream natural language understanding (NLU) tasks, it is still poorly understood how neural networks change after fine-tuning.

Natural Language Understanding

Comparing Test Sets with Item Response Theory

no code implementations ACL 2021 Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman

Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.

Natural Language Understanding

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

1 code implementation EMNLP 2020 Alex Warstadt, Yian Zhang, Haau-Sing Li, Haokun Liu, Samuel R. Bowman

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding.

Binary Classification

Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data

1 code implementation EMNLP (insights) 2020 William Huang, Haokun Liu, Samuel R. Bowman

A growing body of work shows that models exploit annotation artifacts to achieve state-of-the-art performance on standard crowdsourced benchmarks---datasets collected from crowdworkers to create an evaluation task---while still failing on out-of-domain examples for the same task.

counterfactual Natural Language Inference +2

Precise Task Formalization Matters in Winograd Schema Evaluations

1 code implementation EMNLP 2020 Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability.

Language Modelling Multiple-choice

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings.

Question Answering Retrieval +3

BLiMP: The Benchmark of Linguistic Minimal Pairs for English

4 code implementations TACL 2020 Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, Samuel R. Bowman

We introduce The Benchmark of Linguistic Minimal Pairs (shortened to BLiMP), a challenge set for evaluating what language models (LMs) know about major grammatical phenomena in English.

MEMD: A Diversity-Promoting Learning Framework for Short-Text Conversation

no code implementations COLING 2018 Meng Zou, Xihan Li, Haokun Liu, Zhi-Hong Deng

Neural encoder-decoder models have been widely applied to conversational response generation, which is a research hot spot in recent years.

Conversational Response Generation Response Generation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.