no code implementations • 20 Dec 2019 • Mukul Kumar, Youna Hu, Will Headden, Rahul Goutam, Heran Lin, Bing Yin
Recent works such as BERT have demonstrated the success of a large transformer encoder architecture with language model pre-training on a variety of NLP tasks.