SOMD (SOftware Mention Detection)

The dataset contains the training and test data for the SOftware Mention Detection challenge. The data is derived from the SoMeSci Knowledge Graph of software mentions.

Subtask 1 deals with the recognition of software mentions and the classification of mention (e.g. Usage, Creation,...) and software types (e.g. Application, PlugIn,...) at the same time Subtask 2 requires the recognition of additional meta data of software mentions (e.g. Version, Developer, URL,...) Subtask 3 deals with extracting the relations between the different entities of interest (e.g. Version_of, License_of,...) A detailed description of the dataset including the creation and a baseline for the different subtasks can be found in the following article

D. Schindler, F. Bensmann, S. Dietze, and F. Krüger, “SoMeSci—A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles,” in Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21), Virtual Event, QLD, Australia: Association for Computing Machinery, Nov. 2021. doi: 10.1145/3459637.3482017.


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


