Search Results for author: Noam Razin

Found 10 papers, 8 papers with code

Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

1 code implementation12 Feb 2024 Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen

This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.

Vanishing Gradients in Reinforcement Finetuning of Language Models

1 code implementation31 Oct 2023 Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms.

What Algorithms can Transformers Learn? A Study in Length Generalization

no code implementations24 Oct 2023 Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity.

What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

1 code implementation20 Mar 2023 Yotam Alexander, Nimrod De La Vega, Noam Razin, Nadav Cohen

Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics.

On the Ability of Graph Neural Networks to Model Interactions Between Vertices

1 code implementation NeurIPS 2023 Noam Razin, Tom Verbin, Nadav Cohen

Formalizing strength of interactions through an established measure known as separation rank, we quantify the ability of certain GNNs to model interaction between a given subset of vertices and its complement, i. e. between the sides of a given partition of input vertices.

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

1 code implementation27 Jan 2022 Noam Razin, Asaf Maman, Nadav Cohen

In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks.

Implicit Regularization in Tensor Factorization

1 code implementation19 Feb 2021 Noam Razin, Asaf Maman, Nadav Cohen

Recent efforts to unravel the mystery of implicit regularization in deep learning have led to a theoretical focus on matrix factorization -- matrix completion via linear neural network.

Matrix Completion

Implicit Regularization in Deep Learning May Not Be Explainable by Norms

1 code implementation NeurIPS 2020 Noam Razin, Nadav Cohen

Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning.

Matrix Completion Open-Ended Question Answering

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

1 code implementation14 Aug 2019 Oren Barkan, Noam Razin, Itzik Malkiel, Ori Katz, Avi Caciularu, Noam Koenigstein

In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks.

Knowledge Distillation Natural Language Understanding +4

Cannot find the paper you are looking for? You can Submit a new open access paper.