I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths

18 Jun 2020Hyoungwook NamSeung Byum SeoVikram Sharma MailthodyNoor MichaelLan Li

Self-attention has emerged as a vital component of state-of-the-art sequence-to-sequence models for natural language processing in recent years, brought to the forefront by pre-trained bi-directional Transformer models. Its effectiveness is partly due to its non-sequential architecture, which promotes scalability and parallelism but limits the model to inputs of a bounded length... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper