no code implementations • NAACL (TeachingNLP) 2021 • Greg Durrett, Jifan Chen, Shrey Desai, Tanya Goyal, Lucas Kabela, Yasumasa Onoe, Jiacheng Xu
We present a series of programming assignments, adaptable to a range of experience levels from advanced undergraduate to PhD, to teach students design and implementation of modern NLP systems.
1 code implementation • 2 May 2024 • Prasann Singhal, Nathan Lambert, Scott Niekum, Tanya Goyal, Greg Durrett
Varied approaches for aligning language models have been proposed, including supervised fine-tuning, RLHF, and direct optimization methods such as DPO.
3 code implementations • 1 Apr 2024 • Yekyung Kim, Yapei Chang, Marzena Karpinska, Aparna Garimella, Varun Manjunatha, Kyle Lo, Tanya Goyal, Mohit Iyyer
While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.
1 code implementation • 11 Oct 2023 • Zhiyuan Zeng, Jiatong Yu, Tianyu Gao, Yu Meng, Tanya Goyal, Danqi Chen
As research in large language models (LLMs) continues to accelerate, LLM-based evaluation has emerged as a scalable and cost-effective alternative to human evaluations for comparing the ever increasing list of models.
1 code implementation • 5 Oct 2023 • Prasann Singhal, Tanya Goyal, Jiacheng Xu, Greg Durrett
Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.
2 code implementations • 1 Oct 2023 • Yapei Chang, Kyle Lo, Tanya Goyal, Mohit Iyyer
We find that closed-source LLMs such as GPT-4 and Claude 2 produce summaries with higher BooookScore than those generated by open-source models.
1 code implementation • 2 Mar 2023 • Ryo Kamoi, Tanya Goyal, Juan Diego Rodriguez, Greg Durrett
Textual entailment models are increasingly applied in settings like fact-checking, presupposition verification in question answering, or summary evaluation.
1 code implementation • 13 Oct 2022 • Ryo Kamoi, Tanya Goyal, Greg Durrett
Despite recent progress in abstractive summarization, models often generate summaries with factual errors.
1 code implementation • 26 Sep 2022 • Tanya Goyal, Junyi Jessy Li, Greg Durrett
Finally, we evaluate models on a setting beyond generic summarization, specifically keyword-based summarization, and show how dominant fine-tuning approaches compare to prompting.
1 code implementation • 25 May 2022 • Liyan Tang, Tanya Goyal, Alexander R. Fabbri, Philippe Laban, Jiacheng Xu, Semih Yavuz, Wojciech Kryściński, Justin F. Rousseau, Greg Durrett
We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models.
1 code implementation • 19 May 2022 • Tanya Goyal, Junyi Jessy Li, Greg Durrett
In this work, we introduce SNaC, a narrative coherence evaluation framework rooted in fine-grained annotations for long summaries.
2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.
no code implementations • Findings (ACL) 2022 • Tanya Goyal, Jiacheng Xu, Junyi Jessy Li, Greg Durrett
Across different datasets (CNN/DM, XSum, MediaSum) and summary properties, such as abstractiveness and hallucination, we study what the model learns at different stages of its fine-tuning process.
1 code implementation • 8 Oct 2021 • Tanya Goyal, Nazneen Fatema Rajani, Wenhao Liu, Wojciech Kryściński
Summarization systems make numerous "decisions" about summary properties during inference, e. g. degree of copying, specificity and length of outputs, etc.
no code implementations • 29 Sep 2021 • Tanya Goyal, Nazneen Rajani, Wenhao Liu, Wojciech Maciej Kryscinski
Existing abstractive summarization models lack explicit control mechanisms that would allow users to influence the stylistic features of the model outputs.
2 code implementations • NAACL 2021 • Tanya Goyal, Greg Durrett
Recent pre-trained abstractive summarization systems have started to achieve credible performance, but a major barrier to their use in practice is their propensity to output summaries that are not faithful to the input and that contain factual errors.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tanya Goyal, Greg Durrett
Experiments show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods or those based on question generation, while additionally localizing the erroneous parts of the generation.
2 code implementations • ACL 2020 • Tanya Goyal, Greg Durrett
Paraphrasing natural language sentences is a multifaceted process: it might involve replacing individual words or short phrases, local rearrangement of content, or high-level restructuring like topicalization or passivization.
3 code implementations • ACL 2019 • Tanya Goyal, Greg Durrett
Data-driven models have demonstrated state-of-the-art performance in inferring the temporal ordering of events in text.
no code implementations • WS 2018 • Niyati Chhaya, Kushal Chawla, Tanya Goyal, Ch, Projjal a, Jaya Singh
We present a novel approach to model human frustration in text.
no code implementations • EMNLP 2017 • Tanya Goyal, Sachin Kelkar, Manas Agarwal, Jeenu Grover
In this paper, we present a novel approach to infer significance of various textual edits to documents.