Search Results for author: Mohammadi Zaki

Found 6 papers, 0 papers with code

Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

no code implementations • 20 Mar 2024 • Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Actor-Critic based Improper Reinforcement Learning

no code implementations • 19 Jul 2022 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Improper Reinforcement Learning with Gradient-based Policy Optimization

no code implementations • 16 Feb 2021 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan, Shie Mannor

We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners

no code implementations • 13 Jun 2020 • Mohammadi Zaki, Avi Mohan, Aditya Gopalan

We study the problem of best arm identification in linearly parameterised multi-armed bandits.

Multi-Armed Bandits

Paper
Add Code

Towards Optimal and Efficient Best Arm Identification in Linear Bandits

no code implementations • 5 Nov 2019 • Mohammadi Zaki, Avinash Mohan, Aditya Gopalan

We give a new algorithm for best arm identification in linearly parameterised bandits in the fixed confidence setting.

Paper
Add Code

Low-rank Bandits with Latent Mixtures

no code implementations • 6 Sep 2016 • Aditya Gopalan, Odalric-Ambrym Maillard, Mohammadi Zaki

This induces a low-rank structure on the matrix of expected rewards r a, b from recommending item a to user b.

Recommendation Systems

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.