Collaboration or Corporate Capture? Quantifying NLP's Reliance on Industry Artifacts and Contributions

6 Dec 2023  ·  Will Aitken, Mohamed Abdalla, Karen Rudie, Catherine Stinson ·

The advent of transformers, higher computational budgets, and big data has engendered remarkable progress in Natural Language Processing (NLP). Impressive performance of industry pre-trained models has garnered public attention in recent years and made news headlines. That these are industry models is noteworthy. Rarely, if ever, are academic institutes producing exciting new NLP models. Using these models is critical for competing on NLP benchmarks and correspondingly to stay relevant in NLP research. We surveyed 100 papers published at EMNLP 2022 to determine whether this phenomenon constitutes a reliance on industry for NLP publications. We find that there is indeed a substantial reliance. Citations of industry artifacts and contributions across categories is at least three times greater than industry publication rates per year. Quantifying this reliance does not settle how we ought to interpret the results. We discuss two possible perspectives in our discussion: 1) Is collaboration with industry still collaboration in the absence of an alternative? Or 2) has free NLP inquiry been captured by the motivations and research direction of private corporations?

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here