no code implementations • AACL (NLP-TEA) 2020 • Yikang Luo, Zuyi Bao, Chen Li, Rui Wang
For the correction subtask, we utilize the masked language model, the seq2seq model and the spelling check model to generate corrections based on the detection results.
1 code implementation • 22 Oct 2022 • Yue Zhang, Bo Zhang, Zhenghua Li, Zuyi Bao, Chen Li, Min Zhang
Then, we obtain parse trees of the source incorrect sentences by projecting trees of the target correct sentences.
2 code implementations • 23 Jun 2022 • Yue Zhang, Haochen Jiang, Zuyi Bao, Bo Zhang, Chen Li, Zhenghua Li
We have accumulated 1, 119 error templates for Chinese GEC based on this method.
2 code implementations • NAACL 2022 • Yue Zhang, Zhenghua Li, Zuyi Bao, Jiacheng Li, Bo Zhang, Chen Li, Fei Huang, Min Zhang
This paper presents MuCGEC, a multi-reference multi-source evaluation dataset for Chinese Grammatical Error Correction (CGEC), consisting of 7, 063 sentences collected from three Chinese-as-a-Second-Language (CSL) learner sources.
no code implementations • EMNLP 2021 • Yue Zhang, Bo Zhang, Rui Wang, Junjie Cao, Chen Li, Zuyi Bao
Previous works on key information extraction from visually rich documents (VRDs) mainly focus on labeling the text within each bounding box (i. e., semantic entity), while the relations in-between are largely unexplored.
Ranked #4 on Entity Linking on FUNSD
no code implementations • Findings of the Association for Computational Linguistics 2020 • Zuyi Bao, Chen Li, Rui Wang
Chinese spelling check is a challenging task due to the characteristics of the Chinese language, such as the large character set, no word boundary, and short word length.
Optical Character Recognition Optical Character Recognition (OCR) +1
1 code implementation • IJCNLP 2019 • Zuyi Bao, Rui Huang, Chen Li, Kenny Q. Zhu
Previous work on cross-lingual sequence labeling tasks either requires parallel data or bridges the two languages through word-byword matching.
no code implementations • ICLR 2020 • Wei Wang, Bin Bi, Ming Yan, Chen Wu, Zuyi Bao, Jiangnan Xia, Liwei Peng, Luo Si
Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as sentiment classification, natural language inference, semantic textual similarity and question answering.
Ranked #1 on Natural Language Inference on QNLI
no code implementations • WS 2018 • Chen Li, Junpei Zhou, Zuyi Bao, Hengyou Liu, Guangwei Xu, Linlin Li
In the correction stage, candidates were generated by the three GEC models and then merged to output the final corrections for M and S types.
no code implementations • WS 2017 • Zuyi Bao, Si Li, Weiran Xu, Sheng Gao
For Chinese word segmentation, the large-scale annotated corpora mainly focus on newswire and only a handful of annotated data is available in other domains such as patents and literature.
no code implementations • WS 2017 • Jianbo Zhao, Hao liu, Zuyi Bao, Xiaopeng Bai, Si Li, Zhiqing Lin
Detection and correction of Chinese grammatical errors have been two of major challenges for Chinese automatic grammatical error diagnosis. This paper presents an N-gram model for automatic detection and correction of Chinese grammatical errors in NLPTEA 2017 task.