Search Results for author: Nikhil Prakash

Found 4 papers, 0 papers with code

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

no code implementations22 Feb 2024 Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking.

Code Generation Instruction Following

Discovering Variable Binding Circuitry with Desiderata

no code implementations7 Jul 2023 Xander Davies, Max Nadeau, Nikhil Prakash, Tamar Rott Shaham, David Bau

Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits.

Conceptualization and Framework of Hybrid Intelligence Systems

no code implementations11 Dec 2020 Nikhil Prakash, Kory W. Mathewson

As artificial intelligence (AI) systems are getting ubiquitous within our society, issues related to its fairness, accountability, and transparency are increasing rapidly.

Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.