ColLex.en: Automatically Generating and Evaluating a Full-form Lexicon for English

The paper describes a procedure for the automatic generation of a large full-form lexicon of English. We put emphasis on two statistical methods to lexicon extension and adjustment: in terms of a letter-based HMM and in terms of a detector of spelling variants and misspellings. The resulting resource, {\textbackslash}collexen, is evaluated with respect to two tasks: text categorization and lexical coverage by example of the SUSANNE corpus and the {\textbackslash}openanc.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here