1 code implementation • 14 Aug 2023 • Nan Ding, Tomer Levinboim, Jialin Wu, Sebastian Goodman, Radu Soricut
Recent empirical evidence indicates that transformer based in-context learning performs better when using a prefix language model (prefixLM), in which in-context samples can all attend to each other, compared to causal language models (causalLM), which use auto-regressive attention that prohibits in-context samples to attend to future samples.
no code implementations • CVPR 2023 • Zifan Wang, Nan Ding, Tomer Levinboim, Xi Chen, Radu Soricut
Recent research in robust optimization has shown an overfitting-like phenomenon in which models trained against adversarial attacks exhibit higher robustness on the training set compared to the test set.
1 code implementation • 14 Sep 2022 • Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner, Anelia Angelova, Xiaohua Zhai, Neil Houlsby, Radu Soricut
PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages.
2 code implementations • NAACL 2022 • Soravit Changpinyo, Doron Kukliansky, Idan Szpektor, Xi Chen, Nan Ding, Radu Soricut
Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but has not enjoyed the same level of engagement in terms of data creation.
1 code implementation • 10 Mar 2022 • Nan Ding, Xi Chen, Tomer Levinboim, Beer Changpinyo, Radu Soricut
With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention.
Ranked #5 on Transferability on classification benchmark
no code implementations • NeurIPS 2021 • Nan Ding, Xi Chen, Tomer Levinboim, Sebastian Goodman, Radu Soricut
Despite recent advances in its theoretical understanding, there still remains a significant gap in the ability of existing PAC-Bayesian theories on meta-learning to explain performance improvements in the few-shot learning setting, where the number of training examples in the target tasks is severely limited.
1 code implementation • EMNLP 2021 • Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, Yanqi Zhou, Wei Li, Nan Ding, Jake Marcus, Adam Roberts, Colin Raffel
The research community has proposed copious modifications to the Transformer architecture since it was introduced over three years ago, relatively few of which have seen widespread adoption.
3 code implementations • CVPR 2021 • Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut
The availability of large-scale image captioning and visual question answering datasets has contributed significantly to recent successes in vision-and-language pre-training.
Ranked #9 on Image Captioning on nocaps-val-out-domain
no code implementations • 6 Dec 2020 • Xiaotong Guo, Qiusheng Gu, Nan Ding, Xiaoling Yu, Yongyun Chen
We also find that CT AGNs have a higher Eddington ratio than non-CT AGNs, and that both CT AGNs and non-CT AGNs show similar properties of host galaxies.
Astrophysics of Galaxies High Energy Astrophysical Phenomena
no code implementations • EMNLP (Eval4NLP) 2020 • Xi Chen, Nan Ding, Tomer Levinboim, Radu Soricut
Recent advances in automatic evaluation metrics for text have shown that deep contextualized word representations, such as those generated by BERT encoders, are helpful for designing metrics that correlate well with human judgements.
no code implementations • EMNLP 2020 • Sebastian Goodman, Nan Ding, Radu Soricut
Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps.
no code implementations • 29 Sep 2020 • Nan Ding, Xinjie Fan, Zhenzhong Lan, Dale Schuurmans, Radu Soricut
Models based on the Transformer architecture have achieved better accuracy than the ones based on competing architectures for a large set of tasks.
4 code implementations • 5 Mar 2020 • Noam Shazeer, Zhenzhong Lan, Youlong Cheng, Nan Ding, Le Hou
We introduce "talking-heads attention" - a variation on multi-head attention which includes linearprojections across the attention-heads dimension, immediately before and after the softmax operation. While inserting only a small number of additional parameters and a moderate amount of additionalcomputation, talking-heads attention leads to better perplexities on masked language modeling tasks, aswell as better quality when transfer-learning to language comprehension and question answering tasks.
1 code implementation • 12 Feb 2020 • Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick
To highlight the impact of our work on a real-world application, we couple LOGAN with a many-to-many long-read alignment software called BELLA, and demonstrate that our implementation improves the overall BELLA runtime by up to 10. 6x.
no code implementations • 7 Feb 2020 • Qian Liu, Dongyang Cai, Jie Liu, Nan Ding, Tao Wang
The standard non-local (NL) module is effective in aggregating frame-level features on the task of video classification but presents low parameters efficiency and high computational cost.
2 code implementations • ACL 2018 • Piyush Sharma, Nan Ding, Sebastian Goodman, Radu Soricut
We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider variety of both images and image caption styles.
no code implementations • NAACL 2018 • Ye Zhang, Nan Ding, Radu Soricut
Supervised training of abstractive language generation models results in learning conditional probabilities over language sequences based on the supervised training signal.
1 code implementation • NeurIPS 2017 • Nan Ding, Radu Soricut
Policy-gradient approaches to reinforcement learning have two common and undesirable overhead procedures, namely warm-start training and sample variance reduction.
no code implementations • 22 Dec 2016 • Nan Ding, Sebastian Goodman, Fei Sha, Radu Soricut
We introduce a new multi-modal task for computer systems, posed as a combined vision-language comprehension challenge: identifying the most suitable text describing a scene, given several similar options.
no code implementations • 14 Dec 2016 • Radu Soricut, Nan Ding
We present a family of neural-network--inspired models for computing continuous word representations, specifically designed to exploit both monolingual and multilingual text.
1 code implementation • 13 Dec 2016 • Radu Soricut, Nan Ding
We present a dual contribution to the task of machine reading-comprehension: a technique for creating large-sized machine-comprehension (MC) datasets using paragraph-vector models; and a novel, hybrid neural-network architecture that combines the representation power of recurrent neural networks with the discriminative power of fully-connected multi-layered networks.
no code implementations • NeurIPS 2016 • Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin
In this paper we develop theory to show that while the bias and MSE of an SG-MCMC algorithm depend on the staleness of stochastic gradients, its estimation variance (relative to the expected estimate, based on a prescribed number of samples) is independent of it.
no code implementations • NeurIPS 2015 • Changyou Chen, Nan Ding, Lawrence Carin
Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators.
2 code implementations • 31 Jul 2016 • Sergio Boixo, Sergei V. Isakov, Vadim N. Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J. Bremner, John M. Martinis, Hartmut Neven
We study the task of sampling from the output distributions of (pseudo-)random quantum circuits, a natural task for benchmarking quantum computers.
Quantum Physics
no code implementations • NeurIPS 2015 • Farzaneh Mirzazadeh, Siamak Ravanbakhsh, Nan Ding, Dale Schuurmans
A key bottleneck in structured output prediction is the need for inference during training and testing, usually requiring some form of dynamic programming.
no code implementations • 7 Apr 2015 • Vasil S. Denchev, Nan Ding, Shin Matsushima, S. V. N. Vishwanathan, Hartmut Neven
If actual quantum optimization were to be used with this algorithm in the future, we would expect equivalent or superior results at much smaller time and energy costs during training.
no code implementations • ICCV 2015 • Nan Ding, Jia Deng, Kevin Murphy, Hartmut Neven
In this paper, we extend the HEX model to allow for soft or probabilistic relations between labels, which is useful when there is uncertainty about the relationship between two labels (e. g., an antelope is "sort of" furry, but not to the same degree as a grizzly bear).
no code implementations • NeurIPS 2014 • Nan Ding, Youhan Fang, Ryan Babbush, Changyou Chen, Robert D. Skeel, Hartmut Neven
To remedy this problem, we show that one can leverage a small number of additional variables in order to stabilize momentum fluctuations induced by the unknown noise.
no code implementations • 17 Jun 2014 • Ryan Babbush, Vasil Denchev, Nan Ding, Sergei Isakov, Hartmut Neven
Quantum annealing is a heuristic quantum algorithm which exploits quantum resources to minimize an objective function embedded as the energy levels of a programmable physical system.
no code implementations • NeurIPS 2011 • Nan Ding, Yuan Qi, S. V. N. Vishwanathan
Approximate inference is an important technique for dealing with large, intractable graphical models based on the exponential family of distributions.
no code implementations • NeurIPS 2010 • Nan Ding, S. V. N. Vishwanathan
We extend logistic regression by using t-exponential families which were introduced recently in statistical physics.