no code implementations • RANLP 2021 • Wazir Ali, Zenglin Xu, Jay Kumar
In this paper, we introduce the SiPOS dataset for part-of-speech tagging in the low-resource Sindhi language with quality baselines.
no code implementations • EACL (WASSA) 2021 • Wazir Ali, Naveed Ali, Yong Dai, Jay Kumar, Saifullah Tumrani, Zenglin Xu
In this paper, we develop Sindhi subjective lexicon using a merger of existing English resources: NRC lexicon, list of opinion words, SentiWordNet, Sindhi-English bilingual dictionary, and collection of Sindhi modifiers.
no code implementations • 30 Dec 2020 • Wazir Ali, Jay Kumar, Zenglin Xu, Congjian Luo, Junyu Lu, Junming Shao, Rajesh Kumar, Yazhou Ren
The word segmentation is a fundamental and inevitable prerequisite for many languages.
no code implementations • ACL 2020 • Jay Kumar, Junming Shao, Salah Uddin, Wazir Ali
Clustering short text streams is a challenging task due to its unique properties: infinite length, sparse data representation and cluster evolution.
no code implementations • LREC 2020 • Wazir Ali, Junyu Lu, Zenglin Xu
We introduce the SiNER: a named entity recognition (NER) dataset for low-resourced Sindhi language with quality baselines.
no code implementations • 28 Nov 2019 • Wazir Ali, Jay Kumar, Junyu Lu, Zenglin Xu
Our intrinsic evaluation results demonstrate the high quality of our generated Sindhi word embeddings using SG, CBoW, and GloVe as compare to SdfastText word representations.