Search Results for author: Jan-Philipp Fränken

Found 6 papers, 4 papers with code

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

1 code implementation22 Apr 2024 Jan-Philipp Fränken, Eric Zelikman, Rafael Rafailov, Kanishk Gandhi, Tobias Gerstenberg, Noah D. Goodman

On single-turn dialogue and summarization, a SAMI-trained mistral-7b outperforms the initial pretrained model, with win rates between 66% and 77%.

Language Modelling

Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models

1 code implementation17 Apr 2024 Jan-Philipp Fränken, Kanishk Gandhi, Tori Qiu, Ayesha Khawaja, Noah D. Goodman, Tobias Gerstenberg

We collected moral permissibility and intention judgments from human participants for a subset of our items and compared these judgments to those from two language models (GPT-4 and Claude-2) across eight conditions.

Decision Making Language Modelling +1

STaR-GATE: Teaching Language Models to Ask Clarifying Questions

1 code implementation28 Mar 2024 Chinmaya Andukuri, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

After two iterations of self-improvement, the Questioner asks better questions, allowing it to generate responses that are preferred over responses from the initial model on 72% of tasks.

Language Modelling

Social Contract AI: Aligning AI Assistants with Implicit Group Norms

1 code implementation26 Oct 2023 Jan-Philipp Fränken, Sam Kwok, Peixuan Ye, Kanishk Gandhi, Dilip Arumugam, Jared Moore, Alex Tamkin, Tobias Gerstenberg, Noah D. Goodman

We explore the idea of aligning an AI assistant by inverting a model of users' (unknown) preferences from observed interactions.

Modeling infant object perception as program induction

no code implementations28 Aug 2023 Jan-Philipp Fränken, Christopher G. Lucas, Neil R. Bramley, Steven T. Piantadosi

Infants expect physical objects to be rigid and persist through space and time and in spite of occlusion.

Attribute Object +2

Understanding Social Reasoning in Language Models with Language Models

no code implementations NeurIPS 2023 Kanishk Gandhi, Jan-Philipp Fränken, Tobias Gerstenberg, Noah D. Goodman

Using our framework, we create a new social reasoning benchmark (BigToM) for LLMs which consists of 25 controls and 5, 000 model-written evaluations.

Cannot find the paper you are looking for? You can Submit a new open access paper.