Hallucination Evaluation
11 papers with code • 0 benchmarks • 1 datasets
Evaluate the ability of LLM to generate non-hallucination text or assess the capability of LLM to recognize hallucinations.
Benchmarks
These leaderboards are used to track progress in Hallucination Evaluation
Most implemented papers
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i. e., content that conflicts with the source or cannot be verified by the factual knowledge.
Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Experimental results on both discrimination-based and generation-based hallucination evaluation benchmarks, such as TruthfulQA and \textsc{FActScore}, demonstrate that our proposed ICD methods can effectively enhance the factuality of LLMs across various model sizes and families.
MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models
Large language models (LLMs) have achieved remarkable performance in natural language understanding and generation tasks.
Evaluation and Analysis of Hallucination in Large Vision-Language Models
In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
Large vision-language models (LVLMs) have shown remarkable abilities in understanding visual information with human languages.
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.
Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization
Despite the remarkable performance of generative large language models (LLMs) on abstractive summarization, they face two significant challenges: their considerable size and tendency to hallucinate.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations.
Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption Rewrites
The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods.