no code implementations • Findings (ACL) 2022 • Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, Jing Lu, Dara Bahri, Ji Ma, Jai Prakash Gupta, Cicero Nogueira dos santos, Yi Tay, Don Metzler
This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference.
1 code implementation • ACL 2021 • Yi Tay, Mostafa Dehghani, Jai Prakash Gupta, Vamsi Aribandi, Dara Bahri, Zhen Qin, Donald Metzler
In the context of language models, are convolutional models competitive to Transformers when pre-trained?