no code implementations • 22 Oct 2021 • William W. Howard, R. M. Buehrer, Anthony Martone
We model a radar network as an adversarial bandit problem, where the environment pre-selects reward sequences for each of several actions available to the network.