no code implementations • 21 Apr 2020 • Jason Rhuggenaath, Alp Akcay, Yingqian Zhang, Uzay Kaymak
In this paper, we study a slate bandit problem where the function that determines the slate-level reward is non-separable: the optimal value of the function cannot be determined by learning the optimal action for each slot.
1 code implementation • 3 Apr 2020 • Paulo R. de O. da Costa, Jason Rhuggenaath, Yingqian Zhang, Alp Akcay
We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution.
no code implementations • 17 Jul 2019 • Dylan Rijnen, Jason Rhuggenaath, Paulo R. de O. da Costa, Yingqian Zhang
In many situations, simulation models are developed to handle complex real-world business optimisation problems.