Agglomerative Clustering of Handwritten Numerals to Determine Similarity of Different Languages

22 Nov 2020  ·  Md. Rahat-uz-Zaman, Shadmaan Hye ·

Handwritten numerals of different languages have various characteristics. Similarities and dissimilarities of the languages can be measured by analyzing the extracted features of the numerals. Handwritten numeral datasets are available and accessible for many renowned languages of different regions. In this paper, several handwritten numeral datasets of different languages are collected. Then they are used to find the similarity among those written languages through determining and comparing the similitude of each handwritten numerals. This will help to find which languages have the same or adjacent parent language. Firstly, a similarity measure of two numeral images is constructed with a Siamese network. Secondly, the similarity of the numeral datasets is determined with the help of the Siamese network and a new random sample with replacement similarity averaging technique. Finally, an agglomerative clustering is done based on the similarities of each dataset. This clustering technique shows some very interesting properties of the datasets. The property focused in this paper is the regional resemblance of the datasets. By analyzing the clusters, it becomes easy to identify which languages are originated from similar regions.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods