Attention Is All You Need

NeurIPS 2017 Ashish VaswaniNoam ShazeerNiki ParmarJakob UszkoreitLlion JonesAidan N. GomezLukasz KaiserIllia Polosukhin

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism... (read more)

PDF Abstract