Long-Context Understanding

12 papers with code • 2 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

GLM-130B: An Open Bilingual Pre-trained Model

thudm/glm-130b 5 Oct 2022

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.

GPT-4 Technical Report

openai/evals Preprint 2023

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

lm-sys/fastchat NeurIPS 2023

Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences.

FABLES: Evaluating faithfulness and content selection in book-length summarization

mungg/fables 1 Apr 2024

While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

lfy79001/s3eval 23 Oct 2023

The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like long-context understanding and reasoning.

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

thudm/longbench 28 Aug 2023

In this paper, we introduce LongBench, the first bilingual, multi-task benchmark for long context understanding, enabling a more rigorous evaluation of long context understanding.

LooGLE: Can Long-Context Language Models Understand Long Contexts?

bigai-nlco/loogle 8 Nov 2023

In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding.

InternLM2 Technical Report

internlm/internlm 26 Mar 2024

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).

Long-context LLMs Struggle with Long In-context Learning

tiger-ai-lab/longiclbench 2 Apr 2024

Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs.

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

open-compass/ada-leval 9 Apr 2024

Recently, the large language model (LLM) community has shown increasing interest in enhancing LLMs' capability to handle extremely long documents.