EXmatcher: Combining Features Based on Reference Strings and Segments to Enhance Citation Matching

11 Jun 2019  ·  Behnam Ghavimi, Wolfgang Otto, Philipp Mayr ·

Citation matching is a challenging task due to different problems such as the variety of citation styles, mistakes in reference strings and the quality of identified reference segments. The classic citation matching configuration used in this paper is the combination of blocking technique and a binary classifier. Three different possible inputs (reference strings, reference segments and a combination of reference strings and segments) were tested to find the most efficient strategy for citation matching. In the classification step, we describe the effect which the probabilities of reference segments can have in citation matching. Our evaluation on a manually curated gold standard showed that the input data consisting of the combination of reference segments and reference strings lead to the best result. In addition, the usage of the probabilities of the segmentation slightly improves the result.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here