Search Results for author: Mohammad Taufeeque

Found 3 papers, 3 papers with code

Exploiting Novel GPT-4 APIs

1 code implementation21 Dec 2023 Kellin Pelrine, Mohammad Taufeeque, Michał Zając, Euan McLean, Adam Gleave

Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text generation API.

Language Modelling Retrieval +1

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

1 code implementation26 Oct 2023 Alex Tamkin, Mohammad Taufeeque, Noah D. Goodman

In this setting, our approach overcomes the superposition problem by assigning states to distinct codes, and we find that we can make the neural network behave as if it is in a different state by activating the code for that state.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.