Search Results for author: Mohammad Taufeeque

Found 3 papers, 3 papers with code

Exploiting Novel GPT-4 APIs

1 code implementation • 21 Dec 2023 • Kellin Pelrine, Mohammad Taufeeque, Michał Zając, Euan McLean, Adam Gleave

Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text generation API.

Language Modelling Retrieval +1

Paper
Code

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

1 code implementation • 26 Oct 2023 • Alex Tamkin, Mohammad Taufeeque, Noah D. Goodman

In this setting, our approach overcomes the superposition problem by assigning states to distinct codes, and we find that we can make the neural network behave as if it is in a different state by activating the code for that state.

Quantization

Paper
Code

imitation: Clean Imitation Learning Implementations

2 code implementations • 22 Nov 2022 • Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.

Imitation Learning reinforcement-learning +1

1,143

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.