Search Results for author: Heejune Sheen

Found 3 papers, 0 papers with code

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

no code implementations13 Mar 2024 Heejune Sheen, Siyu Chen, Tianhao Wang, Harrison H. Zhou

Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices.

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality

no code implementations29 Feb 2024 Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang

In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model.

In-Context Learning

Tensor Kernel Recovery for Spatio-Temporal Hawkes Processes

no code implementations24 Nov 2020 Heejune Sheen, Xiaonan Zhu, Yao Xie

We estimate the general influence functions for spatio-temporal Hawkes processes using a tensor recovery approach by formulating the location dependent influence function that captures the influence of historical events as a tensor kernel.

Cannot find the paper you are looking for? You can Submit a new open access paper.