We find that binning-based estimators with bins of equal mass (number of instances) have lower bias than estimators with bins of equal width.
We introduce a similarity index that measures the relationship between representational similarity matrices and does not suffer from this limitation.
The estimation of an f-divergence between two probability distributions based on samples is a fundamental problem in statistics and machine learning.
From this observation, we propose extended rank and sort operators by considering optimal transport (OT) problems (the natural relaxation for assignments) where the auxiliary measure can be any weighted measure supported on $m$ increasing values, where $m \ne n$.
We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks.
We present a general framework for solving a large class of learning problems with non-linear functions of classification rates.
Adaptive gradient-based optimizers such as Adagrad and Adam are crucial for achieving state-of-the-art performance in machine translation and language modeling.
We introduce a general control algorithm that combines the strengths of planning and reinforcement learning to effectively solve these tasks.
In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model.
Ranked #3 on Paraphrase Identification on Quora Question Pairs
Meta-learning algorithms aim to learn two components: a model that predicts targets for a task, and a base learner that quickly updates that model when given examples from a new task.