1 code implementation • 22 Apr 2024 • Tetsuro Morimura, Mitsuki Sakamoto, Yuu Jinnai, Kenshi Abe, Kaito Ariu
This paper addresses the issue of text quality within the preference dataset by focusing on Direct Preference Optimization (DPO), an increasingly adopted reward-model-free RLHF method.
no code implementations • 26 May 2023 • Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki
This paper addresses the problem of learning Nash equilibria in {\it monotone games} where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise.
1 code implementation • 21 Aug 2022 • Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Kentaro Toyoshima, Atsushi Iwasaki
This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings.
1 code implementation • 18 Jun 2022 • Kenshi Abe, Mitsuki Sakamoto, Atsushi Iwasaki
In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games.
no code implementations • 27 Jun 2019 • Mitsuki Sakamoto, Yuta Hiasa, Yoshito Otake, Masaki Takao, Yuki Suzuki, Nobuhiko Sugano, Yoshinobu Sato
Our goal was to develop an automated segmentation method of the bones and muscles in the postoperative CT images.