Exploring Optimal Voting in Native Language Identification
We describe the submissions entered by the National Research Council Canada in the NLI-2017 evaluation. We mainly explored the use of voting, and various ways to optimize the choice and number of voting systems. We also explored the use of features that rely on no linguistic preprocessing. Long ngrams of characters obtained from raw text turned out to yield the best performance on all textual input (written essays and speech transcripts). Voting ensembles turned out to produce small performance gains, with little difference between the various optimization strategies we tried. Our top systems achieved accuracies of 87{\%} on the essay track, 84{\%} on the speech track, and close to 92{\%} by combining essays, speech and i-vectors in the fusion track.
PDF Abstract