Search Results for author: Surya Kanoria

Found 3 papers, 1 papers with code

Soft Preference Optimization: Aligning Language Models to Expert Distributions

no code implementations30 Apr 2024 Arsalan SharifNassab, Sina Ghiassian, Saber Salehkaleybar, Surya Kanoria, Dale Schuurmans

We propose Soft Preference Optimization (SPO), a method for aligning generative models, such as Large Language Models (LLMs), with human preferences, without the need for a reward model.

Computational Efficiency

Automatic Music Playlist Generation via Simulation-based Reinforcement Learning

no code implementations13 Oct 2023 Federico Tomasi, Joseph Cauteruccio, Surya Kanoria, Kamil Ciosek, Matteo Rinaldi, Zhenwen Dai

In this paper, we present a reinforcement learning framework that solves for such limitations by directly optimizing for user satisfaction metrics via the use of a simulated playlist-generation environment.

Collaborative Filtering reinforcement-learning

What to Learn, and How: Toward Effective Learning from Rationales

1 code implementation Findings (ACL) 2022 Samuel Carton, Surya Kanoria, Chenhao Tan

Learning from rationales seeks to augment model prediction accuracy using human-annotated rationales (i. e. subsets of input tokens) that justify their chosen labels, often in the form of intermediate or multitask supervision.

Cannot find the paper you are looking for? You can Submit a new open access paper.