no code implementations • 27 May 2023 • Jueming Hu, Jean-Raphael Gaglione, Yanze Wang, Zhe Xu, Ufuk Topcu, Yongming Liu
We develop an algorithm called Q-learning with reward machines for stochastic games (QRM-SG), to learn the best-response strategy at Nash equilibrium for each agent.
1 code implementation • 24 Jan 2022 • Weijun Chen, Yanze Wang, Chengshuo Du, Zhenglong Jia, Feng Liu, Ran Chen
However, current models do not incorporate the trade-off between efficiency and flexibility and lack the guidance of domain knowledge in the design of graph structure learning algorithms.