MPII Human Pose Descriptions

Introduced by Khan et al. in FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

The MPII Human Pose Descriptions dataset extends the widely-used MPII Human Pose Dataset with rich textual annotations. These annotations are generated by various state-of-the-art language models (LLMs) and include detailed descriptions of the activities being performed, the count of people present, and their specific poses.

The dataset consists of the same image splits as provided in MMPose, with 14644 training samples and 2723 validation samples. Each image is accompanied by one or more pose descriptions generated by different LLMs. The descriptions are also accompanied by additional annotation information, including the activity type, people count, and pose keypoints, which are derived from the original MPII Human Pose Dataset annotations.

By adding textual annotations to the existing human pose dataset, this extended version supports novel research in multi-modal learning, where both visual and textual cues can be explored.

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

EMOTIC

LAGENDA

Usage

MPII Human Pose Descriptions

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

EMOTIC

LAGENDA

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages