no code implementations • 7 May 2024 • Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li
Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources.
1 code implementation • 24 Apr 2024 • Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu
Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context.
1 code implementation • 19 Mar 2024 • Rahul Nadkarni, Yizhong Wang, Noah A. Smith
Language model-based instruction-following systems have lately shown increasing performance on many benchmark tasks, demonstrating the capability of adapting to a broad variety of instructions.
no code implementations • 18 Mar 2024 • Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi
We evaluate the performance of state-of-the-art models, including language-only, vision-only, and layout-only models, and their combinations, on this benchmark.
1 code implementation • 26 Feb 2024 • Bowen Zhao, Zander Brumbaugh, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith
We then develop several methods, from prompting to finetuning, to align LMs to use their most recent knowledge when answering questions, and investigate various factors in this alignment.
1 code implementation • 22 Feb 2024 • Qiyuan He, Yizhong Wang, Wenya Wang
Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating responses to complex queries through large-scale pre-training.
2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
2 code implementations • 16 Jan 2024 • Alisa Liu, Xiaochuang Han, Yizhong Wang, Yulia Tsvetkov, Yejin Choi, Noah A. Smith
Despite the general capabilities of large pretrained language models, they consistently benefit from further adaptation to better achieve desired behaviors.
no code implementations • 12 Jan 2024 • Abhika Mishra, Akari Asai, Vidhisha Balachandran, Yizhong Wang, Graham Neubig, Yulia Tsvetkov, Hannaneh Hajishirzi
On our benchmark, our automatic and human evaluations show that FAVA significantly outperforms ChatGPT and GPT-4 on fine-grained hallucination detection, and edits suggested by FAVA improve the factuality of LM-generated text.
2 code implementations • 17 Nov 2023 • Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques.
2 code implementations • 17 Oct 2023 • Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi
Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its own generations using special tokens, called reflection tokens.
1 code implementation • 17 Oct 2023 • Joel Jang, Seungone Kim, Bill Yuchen Lin, Yizhong Wang, Jack Hessel, Luke Zettlemoyer, Hannaneh Hajishirzi, Yejin Choi, Prithviraj Ammanabrolu
In this work, we study Reinforcement Learning from Personalized Human Feedback (RLPHF) problem, wherein LLMs are aligned to multiple (sometimes conflicting) preferences by modeling alignment as a Multi-Objective Reinforcement Learning (MORL) problem.
1 code implementation • 2 Oct 2023 • Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi
Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks.
1 code implementation • NeurIPS 2023 • Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap.
1 code implementation • ICCV 2023 • Yushi Hu, Benlin Liu, Jungo Kasai, Yizhong Wang, Mari Ostendorf, Ranjay Krishna, Noah A Smith
We introduce TIFA (Text-to-Image Faithfulness evaluation with question Answering), an automatic evaluation metric that measures the faithfulness of a generated image to its text input via visual question answering (VQA).
no code implementations • 20 Dec 2022 • Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew Peters
By converting instructions into modules, HINT models can effectively disregard the length of instructions and few-shot example inputs in terms of compute usage.
16 code implementations • 20 Dec 2022 • Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi
Applying our method to the vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT-001, which was trained with private user data and human annotations.
3 code implementations • 19 Dec 2022 • Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.
7 code implementations • 16 Apr 2022 • Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi
This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions -- training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones.
1 code implementation • Findings (EMNLP) 2021 • Leo Z. Liu, Yizhong Wang, Jungo Kasai, Hannaneh Hajishirzi, Noah A. Smith
Models of language trained on very large corpora have been demonstrated useful for NLP.
no code implementations • ICLR 2021 • Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant
When answering complex questions, people can seamlessly combine information from visual, textual and tabular sources.
1 code implementation • 23 Dec 2020 • Yue Guo, Wei Qiu, Yizhong Wang, Trevor Cohen
Health literacy has emerged as a crucial factor in making appropriate health decisions and ensuring treatment outcomes.
2 code implementations • CCL 2020 • Qianying Liu, Sicong Jiang, Yizhong Wang, Sujian Li
In this paper, we introduce LiveQA, a new question answering dataset constructed from play-by-play live broadcast.
6 code implementations • EMNLP 2020 • Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, Yejin Choi
Experiments across four datasets show that these model-dependent measures reveal three distinct regions in the data map, each with pronounced characteristics.
1 code implementation • IJCNLP 2019 • Eric Wallace, Yizhong Wang, Sujian Li, Sameer Singh, Matt Gardner
The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks.
no code implementations • 30 Jun 2019 • Xin Zhang, An Yang, Sujian Li, Yizhong Wang
Machine reading comprehension aims to teach machines to understand a text like a human and is a new challenging direction in Artificial Intelligence.
3 code implementations • NAACL 2019 • Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner
We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs.
Ranked #14 on Question Answering on DROP Test
1 code implementation • EMNLP 2018 • Yizhong Wang, Sujian Li, Jingfeng Yang
Discourse segmentation, which segments texts into Elementary Discourse Units, is a fundamental step in discourse analysis.
1 code implementation • ACL 2018 • Shuming Ma, Xu sun, Yizhong Wang, Junyang Lin
However, most of the existing neural machine translation models only use one of the correct translations as the targets, and the other correct sentences are punished as the incorrect sentences in the training stage.
no code implementations • ACL 2018 • Yizhong Wang, Kai Liu, Jing Liu, wei he, Yajuan Lyu, Hua Wu, Sujian Li, Haifeng Wang
Machine reading comprehension (MRC) on real web data usually requires the machine to answer a question by analyzing multiple passages retrieved by search engine.
Ranked #3 on Question Answering on MS MARCO
no code implementations • IJCNLP 2017 • Yizhong Wang, Sujian Li, Jingfeng Yang, Xu sun, Houfeng Wang
Identifying implicit discourse relations between text spans is a challenging task because it requires understanding the meaning of the text.
General Classification Implicit Discourse Relation Classification +3
3 code implementations • WS 2018 • Wei He, Kai Liu, Jing Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yu-An Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Tian Wu, Haifeng Wang
Experiments show that human performance is well above current state-of-the-art baseline systems, leaving plenty of room for the community to make improvements.
1 code implementation • ACL 2017 • Yizhong Wang, Sujian Li, Houfeng Wang
Previous work introduced transition-based algorithms to form a unified architecture of parsing rhetorical structures (including span, nuclearity and relation), but did not achieve satisfactory performance.
Ranked #5 on Discourse Parsing on RST-DT
no code implementations • COLING 2016 • Wenjing Fang, Kenny Zhu, Yizhong Wang, Jia Tan
This paper presents a novel high-order dependency parsing framework that targets non-projective treebanks.