Text Data Augmentation: Towards better detection of spear-phishing emails

4 Jul 2020Mehdi ReginaMaxime MeyerSébastien Goutal

Text data augmentation, i.e. the creation of synthetic textual data from an original text, is challenging as augmentation transformations should take into account language complexity while being relevant to the target Natural Language Processing (NLP) task (e.g. Machine Translation, Question Answering, Text Classification, etc.). Motivated by a business application of Business Email Compromise (BEC) detection, we propose a corpus and task agnostic text augmentation framework combining different methods, utilizing BERT language model, multi-step back-translation and heuristics... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper