Bangla Parts-of-Speech Tagging using Bangla Stemmer and Rule based Analyzer

Parts-of-Speech (POS) tagging plays vital roles in the field of Natural Language Processing (NLP), such as - machine translation, spell checker, information retrieval, speech processing, emotion analysis and so on. Bangla is a very inflectional language that induces many variants from a single word. Although there is a few POS Tagger in Bangla language,very small of them address the essence of suffices to identify tag of the words. In this regard, we propose an automated POS Tagging system for Bangla language based on word-suffixes. In our system, we use our own stemming technique to retrieve a possible minimum root words and apply rules according to different forms of suffixes. Moreover, we incorporate a Bangla vocabulary that contains more than 45,000 words with their default tag and a patterned based verb-data-set. These facilitate to improve tagging efficiency of Bangla POS Tagger. We experiment our proposed system on a Bangla text corpus. The result shows that our proposed Bangla POS Tagger has outperformed the known related tagging systems.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here