Deep Neural Networks (DNN) have been widely employed in industry to address various Natural Language Processing (NLP) tasks.
Research in natural language processing proceeds, in part, by demonstrating that new models achieve superior performance (e. g., accuracy) on held-out test data, compared to previous results.
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
We conclude that a variety of methods is necessary to reveal all relevant aspects of a model's grammatical knowledge in a given domain.
Existing works, including ELMO and BERT, have revealed the importance of pre-training for NLP tasks.
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive.
SOTA for Sentence Classification on Paper Field (using extra training data)
CITATION INTENT CLASSIFICATION DEPENDENCY PARSING LANGUAGE MODELLING MEDICAL NAMED ENTITY RECOGNITION PARTICIPANT INTERVENTION COMPARISON OUTCOME EXTRACTION RELATION EXTRACTION SENTENCE CLASSIFICATION