Hallucination Evaluation

11 papers with code • 0 benchmarks • 1 datasets

Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.

Most implemented papers

HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

RUCAIBox/HaluEval 19 May 2023

Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i. e., content that conflicts with the source or cannot be verified by the factual knowledge.

Alleviating Hallucinations of Large Language Models through Induced Hallucinations

hiyouga/llama-factory 25 Dec 2023

Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

wyl-willing/MindMap 17 Aug 2023

Large language models (LLMs) have achieved remarkable performance in natural language understanding and generation tasks.

Evaluation and Analysis of Hallucination in Large Vision-Language Models

junyangwang0410/haelm 29 Aug 2023

In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.

Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

yiyangzhou/lure 1 Oct 2023

Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

junyangwang0410/amber 13 Nov 2023

Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.

Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization

casszhao/prunehall 15 Nov 2023

Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

yuqifan1117/hallucidoctor 22 Nov 2023

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.

UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

IAAR-Shanghai/UHGEval 26 Nov 2023

These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations.

Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites

anonymousanoy/fohe 4 Dec 2023

The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods.