no code implementations • 5 Apr 2024 • Mahesh Lorik Yadav, Harish Guruprasad Ramaswamy, Chandrashekar Lakshminarayanan
Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space.
no code implementations • 20 Nov 2023 • Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar
In this work, we consider a cooperative multi-agent Markov decision process (MDP) involving m agents.
no code implementations • 1 Mar 2022 • Chandrashekar Lakshminarayanan, Amit Vikram Singh, Arun Rajkumar
Using the dual view, in this paper, we rethink the conventional interpretations of DNNs thereby explicitsing the implicit interpretability of DNNs.
no code implementations • 6 Oct 2021 • Chandrashekar Lakshminarayanan, Amit Vikram Singh
To address `black box'-ness, we propose a novel interpretable counterpart of DNNs with ReLUs namely deep linearly gated networks (DLGN): the pre-activations to the gates are generated by a deep linear network, and the gates are then applied as external masks to learn the weights in a different network.
no code implementations • NeurIPS 2020 • Chandrashekar Lakshminarayanan, Amit Vikram Singh
To this end, we encode the on/off state of the gates of a given input in a novel 'neural path feature' (NPF), and the weights of the DNN are encoded in a novel 'neural path value' (NPV).
no code implementations • 10 Feb 2020 • Chandrashekar Lakshminarayanan, Amit Vikram Singh
In DGNs, a single neuronal unit has two components namely the pre-activation input (equal to the inner product the weights of the layer and the previous layer outputs), and a gating value which belongs to $[0, 1]$ and the output of the neuronal unit is equal to the multiplication of pre-activation input and the gating value.
no code implementations • 12 Sep 2017 • Chandrashekar Lakshminarayanan, Csaba Szepesvári
For a given LSA with PR averaging, and data distribution $P$ satisfying the said assumptions, we show that there exists a range of constant step-sizes such that its MSE decays as $O(\frac{1}{t})$.