no code implementations • 13 Mar 2024 • Ben Athiwaratkun, Shiqi Wang, Mingyue Shang, Yuchen Tian, Zijian Wang, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Rob Kwiatowski, Ramesh Nallapati, Bing Xiang
Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens.
no code implementations • 13 Mar 2024 • Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang
In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts.
no code implementations • 9 Mar 2023 • Xiaokai Wei, Sujan Gonugondla, Wasi Ahmad, Shiqi Wang, Baishakhi Ray, Haifeng Qian, Xiaopeng Li, Varun Kumar, Zijian Wang, Yuchen Tian, Qing Sun, Ben Athiwaratkun, Mingyue Shang, Murali Krishna Ramanathan, Parminder Bhatia, Bing Xiang
Such large models incur significant resource usage (in terms of memory, latency, and dollars) as well as carbon footprint.
2 code implementations • 26 Oct 2022 • Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang
Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.
no code implementations • 11 May 2021 • Yang Li, Ben Athiwaratkun, Cicero Nogueira dos santos, Bing Xiang
In this work, we propose to leverage the prior information embedded in pretrained language models (LM) to improve generalization for intent classification and slot labeling tasks with limited training data.
no code implementations • EMNLP 2021 • Dheeru Dua, Cicero Nogueira dos santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh
Compositional reasoning tasks like multi-hop question answering, require making latent decisions to get the final answer, given a question.
1 code implementation • ICLR 2021 • Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos santos, Bing Xiang, Stefano Soatto
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking.
Ranked #3 on Relation Classification on TACRED
no code implementations • EMNLP 2020 • Ben Athiwaratkun, Cicero Nogueira dos santos, Jason Krone, Bing Xiang
We set a new state-of-the-art for few-shot slot labeling, improving substantially upon the previous 5-shot ($75. 0\% \rightarrow 90. 9\%$) and 1-shot ($70. 4\% \rightarrow 81. 0\%$) state-of-the-art results.
2 code implementations • ICLR 2019 • Ben Athiwaratkun, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson
Presently the most successful approaches to semi-supervised learning are based on consistency regularization, whereby a model is trained to be robust to small perturbations of its inputs and parameters.
1 code implementation • ACL 2018 • Ben Athiwaratkun, Andrew Gordon Wilson, Anima Anandkumar
We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information.
2 code implementations • ICLR 2018 • Ben Athiwaratkun, Andrew Gordon Wilson
By representing words with probability densities rather than point vectors, probabilistic word embeddings can capture rich and interpretable semantic information and uncertainty.
Ranked #2 on Lexical Entailment on HyperLex
2 code implementations • ACL 2017 • Ben Athiwaratkun, Andrew Gordon Wilson
Word embeddings provide point representations of words containing useful semantic information.
2 code implementations • TACL 2018 • Xilun Chen, Yu Sun, Ben Athiwaratkun, Claire Cardie, Kilian Weinberger
To tackle the sentiment classification problem in low-resource languages without adequate annotated data, we propose an Adversarial Deep Averaging Network (ADAN) to transfer the knowledge learned from labeled data on a resource-rich source language to low-resource languages where only unlabeled data exists.
no code implementations • 8 Jul 2015 • Ben Athiwaratkun, Keegan Kang
Our results show that CNN feature maps can be used with Random Forests and SVM to yield classification results that outperforms the original CNN.