Distill and Calibrate: Denoising Inconsistent Labeling Instances for Chinese Named Entity Recognition

ACL ARR November 2021  ·  Anonymous ·

Data-driving supervised models for named entity recognition (NER) have made significant improvements on standard benchmarks. However, such models often have severe performance degradation on large-scale noisy data. Thus, a practical and challenging question arises: Can we leverage only a small amount of relatively clean data to guide the NER model learning from large-scale noisy data? To answer this question, we focus on the inconsistent labeling instances problem. We observe that inconsistent labeling instances can be classified into five types of noise, each of which will largely hinder the model performance in our experiments. Based on the above observation, we propose a simple yet effective denoising framework named Distillation and Calibration for Chinese NER (DCNER). DCNER consists: (1) a Dual-stream Label Distillation mechanism for distilling five types of inconsistent labeling instances from the noisy data; and (2) a Consistency-aware Label Calibration network for calibrating inconsistent labeling instances based on relatively clean data. Additionally, we propose the first benchmark towards validating the ability of Chinese NER to resist inconsistent labeling instances. Finally, detailed experiments show that our method consistently and significantly outperforms previous methods on the proposed benchmark.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here