Effective Self-Training for Parsing
We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.
PDF AbstractCode
Tasks
Datasets
Results from the Paper
Ranked #23 on Constituency Parsing on Penn Treebank (using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Constituency Parsing | Penn Treebank | Self-training | F1 score | 92.1 | # 23 |