Search Results for author: Xiang Yue

Found 31 papers, 21 papers with code

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

no code implementations9 Apr 2024 Junpeng Liu, YiFan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.

Optical Character Recognition (OCR)

Long-context LLMs Struggle with Long In-context Learning

1 code implementation2 Apr 2024 Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs.

2k In-Context Learning +1

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

1 code implementation4 Mar 2024 YiFan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin

This iterative cycle of exploration and training fosters continued improvement in the agents.

Contrastive Learning

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

no code implementations26 Feb 2024 Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.

Machine Unlearning of Pre-trained Large Language Models

1 code implementation23 Feb 2024 Jin Yao, Eli Chien, Minxin Du, Xinyao Niu, Tianhao Wang, Zezhou Cheng, Xiang Yue

This study investigates the concept of the `right to be forgotten' within the context of large language models (LLMs).

Machine Unlearning

AttributionBench: How Hard is Automatic Attribution Evaluation?

1 code implementation23 Feb 2024 Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence.

Binary Classification Language Modelling +1

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

no code implementations22 Feb 2024 Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue

However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter.

Code Generation

Data Engineering for Scaling Language Models to 128K Context

2 code implementations15 Feb 2024 Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, Hao Peng

We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.

4k Continual Pretraining

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

no code implementations22 Dec 2023 Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0. 3 with human evaluations, while the human-to-human correlation is 0. 45.

Conditional Image Generation General Knowledge

TableLlama: Towards Open Large Generalist Models for Tables

no code implementations15 Nov 2023 Tianshu Zhang, Xiang Yue, Yifei Li, Huan Sun

Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation11 Sep 2023 Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

no code implementations22 May 2023 Boshi Wang, Xiang Yue, Huan Sun

Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks.

Benchmarking Math +1

Automatic Evaluation of Attribution by Large Language Models

1 code implementation10 May 2023 Xiang Yue, Boshi Wang, Ziru Chen, Kai Zhang, Yu Su, Huan Sun

We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.

Fact Checking Language Modelling +3

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

1 code implementation25 Oct 2022 Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.

Language Modelling Text Generation

Bootstrapping a User-Centered Task-Oriented Dialogue System

no code implementations11 Jul 2022 Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks.

Data Augmentation Dialogue Management +2

Synthetic Question Value Estimation for Domain Adaptation of Question Answering

1 code implementation ACL 2022 Xiang Yue, Ziyu Yao, Huan Sun

Synthesizing QA pairs with a question generator (QG) on the target domain has become a popular approach for domain adaptation of question answering (QA) models.

Domain Adaptation Question Answering

Differential Privacy for Text Analytics via Natural Text Sanitization

1 code implementation Findings (ACL) 2021 Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow

The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.

Language Modelling Privacy Preserving

CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering

2 code implementations30 Oct 2020 Xiang Yue, Xinliang Frederick Zhang, Ziyu Yao, Simon Lin, Huan Sun

Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.

Domain Adaptation Question Answering +2

COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval

1 code implementation EMNLP 2021 Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin, Huan Sun

For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1, 236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query.

16k Retrieval

Practical Annotation Strategies for Question Answering Datasets

no code implementations6 Mar 2020 Bernhard Kratzwald, Xiang Yue, Huan Sun, Stefan Feuerriegel

Here, remarkably, annotating a stratified subset with only 1. 2% of the original training set achieves 97. 7% of the performance as if the complete dataset was annotated.

Question Answering

Towards Making the Most of Context in Neural Machine Translation

1 code implementation19 Feb 2020 Zaixiang Zheng, Xiang Yue, Shu-Jian Huang, Jia-Jun Chen, Alexandra Birch

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted.

Document Level Machine Translation Machine Translation +3

Tensor Decomposition with Relational Constraints for Predicting Multiple Types of MicroRNA-disease Associations

1 code implementation13 Nov 2019 Feng Huang, Xiang Yue, Zhankun Xiong, Zhouxin Yu, Wen Zhang

To this end, we innovatively represent miRNA-disease-type triplets as a tensor and introduce Tensor Decomposition methods to solve the prediction task.

Knowledge Graphs Link Prediction +1

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

1 code implementation21 Jun 2019 Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun

To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i. e., the surface form information, and the global context information for synonym discovery.

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

4 code implementations12 Jun 2019 Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, Huan Sun

Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis.

Graph Embedding Link Prediction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.