Keyword extraction is tasked with the automatic identification of terms that best describe the subject of a document (Source: Wikipedia).
Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems.
Corpus2graph is an open-source NLP-application-oriented tool that generates a word co-occurrence network from a large corpus.
We present a fully unsupervised, extractive text summarization system that leverages a submodularity framework introduced by past research.
Combination of the proposed graph construction and scoring methods leads to a novel, parameterless keyword extraction method (sCAKE) based on semantic connectivity of words in the document.
This shows that the proposed method is independent of the domain, collection, and language of the training corpora.
With growing amounts of available textual data, development of algorithms capable of automatic analysis, categorization and summarization of these data has become a necessity.
Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification.