The dataset is taken from the First shared task on Information Extractor for Conversational Systems in Indian Languages (IECSIL) . It consists of 15,48,570 Hindi words in Devanagari script and corresponding NER labels. Each sentence end is marked by \newline" tag. Fig. 1 shows a snapshot of one sentence in the dataset. Our Dataset has nine classes, namely, Datenum, Event, Location, Name, Number, Occupation, Organization, Other, Things.
1 PAPER • 1 BENCHMARK