Search Results for author: Ang Lv

Found 9 papers, 8 papers with code

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

1 code implementation28 Mar 2024 Ang Lv, Kaiyi Zhang, Yuhan Chen, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

In this paper, we deeply explore the mechanisms employed by Transformer-based language models in factual recall tasks.

Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

1 code implementation4 Mar 2024 Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains.

Data Augmentation GSM8K +2

Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

1 code implementation12 Jan 2024 Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan

In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples.

In-Context Learning Zero-Shot Learning

Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

1 code implementation7 Dec 2023 Yuhan Chen, Ang Lv, Ting-En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan

Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the attention waveform, leading to decreased performance.

Trajectory Planning

Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse

1 code implementation13 Nov 2023 Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan

Recent studies have highlighted a phenomenon in large language models (LLMs) known as "the reversal curse," in which the order of knowledge entities in the training data biases the models' comprehension.

Denoising Language Modelling

DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

no code implementations29 Jun 2023 Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan

In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts.

Data Augmentation Dialogue Generation +2

GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

1 code implementation18 May 2023 Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.

Denoising Music Generation

Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

1 code implementation11 Aug 2022 Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan

These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, due to several issues: (1) lack of aligned lyric-melody training data to sufficiently learn lyric-melody feature alignment; (2) lack of controllability in generation to better and explicitly align the lyric-melody features.

Language Modelling Retrieval

Target-Side Data Augmentation for Sequence Generation

1 code implementation ICLR 2022 Shufang Xie, Ang Lv, Yingce Xia, Lijun Wu, Tao Qin, Rui Yan, Tie-Yan Liu

Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens.

Abstractive Text Summarization Data Augmentation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.