1 code implementation • 23 May 2024 • Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng Li, Deng Cai, Yujiu Yang, Yu Meng
In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism.
no code implementations • 18 Mar 2024 • Yifan Wang, Yafei Liu, Chufan Shi, Haoling Li, Chen Chen, Haonan Lu, Yujiu Yang
Instruction tuning effectively optimizes Large Language Models (LLMs) for downstream tasks.
no code implementations • 10 Feb 2024 • Chufan Shi, Deng Cai, Yujiu Yang
In the rapidly evolving field of text generation, the demand for more precise control mechanisms has become increasingly apparent.
no code implementations • 10 Feb 2024 • Chufan Shi, Haoran Yang, Deng Cai, Zhisong Zhang, Yifan Wang, Yujiu Yang, Wai Lam
Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers.
no code implementations • 3 Nov 2023 • Yifan Wang, Qingyan Guo, Xinzhe Ni, Chufan Shi, Lemao Liu, Haiyun Jiang, Yujiu Yang
In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs), enabling them to learn input-label mappings from demonstrations and perform well on downstream tasks.
no code implementations • 23 Oct 2023 • Chufan Shi, Yixuan Su, Cheng Yang, Yujiu Yang, Deng Cai
Although instruction tuning has proven to be a data-efficient method for transforming LLMs into such generalist models, their performance still lags behind specialist models trained exclusively for specific tasks.
no code implementations • 9 Jun 2023 • Hengyuan Zhang, Dawei Li, Yanran Li, Chenming Shang, Chufan Shi, Yong Jiang
The standard definition generation task requires to automatically produce mono-lingual definitions (e. g., English definitions for English words), but ignores that the generated definitions may also consist of unfamiliar words for language learners.