Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

Introduced by Faria et al. in Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language

The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows:

Specifics of the Core Data:

Type Bangla Banglish English
Train 1875 1875 1875
Test 375 375 375
Validation 250 250 250

Specifics of the Regional Data:

Region Type Train Test Validation
Chittagong Bangla 1875 375 250
Banglish 1875 375 250
Noakhali Bangla 1875 375 250
Banglish 1875 375 250
Sylhet Bangla 1875 375 250
Banglish 1875 375 250
Barishal Bangla 1875 375 250
Banglish 1875 375 250
Mymensingh Bangla 1875 375 250
Banglish 1875 375 250

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages