Search Results for author: Mitsuki Sakamoto

Found 5 papers, 3 papers with code

Filtered Direct Preference Optimization

1 code implementation22 Apr 2024 Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai, Kenshi Abe, Kaito Ariu

This paper addresses the issue of text quality within the preference dataset by focusing on Direct Preference Optimization (DPO), an increasingly adopted reward-model-free RLHF method.

Slingshot Perturbation to Learning in Monotone Games

no code implementations26 May 2023 Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki

This paper addresses the problem of learning Nash equilibria in {\it monotone games} where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise.

Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

1 code implementation21 Aug 2022 Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Kentaro Toyoshima, Atsushi Iwasaki

This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings.

Multi-agent Reinforcement Learning

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games

1 code implementation18 Jun 2022 Kenshi Abe, Mitsuki Sakamoto, Atsushi Iwasaki

In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games.

Cannot find the paper you are looking for? You can Submit a new open access paper.