Naamapadam

Introduced by Mhaske et al. in Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

Naamapadam is a Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence.

Source: Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Named Entity Recognition (NER)

Similar Datasets

IndicCorp

Samanantar

Usage

License

Creative Commons CC0 license (“no rights reserved”)

Modalities

Texts

Naamapadam

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

IndicCorp

Samanantar

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages