Effective Self-Training for Parsing

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

PDF Abstract

Datasets


Results from the Paper


Ranked #23 on Constituency Parsing on Penn Treebank (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Constituency Parsing Penn Treebank Self-training F1 score 92.1 # 23

Methods


No methods listed for this paper. Add relevant methods here