Search Results for author: Arush Tagade

Found 4 papers, 2 papers with code

The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability

1 code implementation • 3 Apr 2024 • Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell

Interpretability techniques are valuable for helping humans understand and oversee AI systems.

Paper
Code

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

no code implementations • 6 Nov 2023 • Rusheb Shah, Quentin Feuillade--Montixi, Soroush Pour, Arush Tagade, Stephen Casper, Javier Rando

Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour.

GPT-4 Language Modelling

Paper
Add Code

Prototype Generation: Robust Feature Visualisation for Data Independent Interpretability

1 code implementation • 29 Sep 2023 • Arush Tagade, Jessica Rumbelow

We introduce Prototype Generation, a stricter and more robust form of feature visualisation for model-agnostic, data-independent interpretability of image classification models.

Image Classification

Paper
Code

Why do CNNs excel at feature extraction? A mathematical explanation

no code implementations • 3 Jul 2023 • Vinoth Nandakumar, Arush Tagade, Tongliang Liu

Over the past decade deep learning has revolutionized the field of computer vision, with convolutional neural network models proving to be very effective for image classification benchmarks.

Classification Image Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.