no code implementations • 1 Aug 2022 • Ben London, Levi Lu, Ted Sandler, Thorsten Joachims
We propose the first boosting algorithm for off-policy learning from logged bandit feedback.
no code implementations • 29 Jun 2018 • Ben London, Ted Sandler
We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback.
no code implementations • NeurIPS 2008 • Ted Sandler, John Blitzer, Partha P. Talukdar, Lyle H. Ungar
Here we present a framework for regularized learning in settings where one has prior knowledge about which features are expected to have similar and dissimilar weights.