Search Results for author: Heejune Sheen

Found 3 papers, 0 papers with code

Implicit Regularization of Gradient Flow on One-Layer Softmax Attention

no code implementations • 13 Mar 2024 • Heejune Sheen, Siyu Chen, Tianhao Wang, Harrison H. Zhou

Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices.

Paper
Add Code

Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality

no code implementations • 29 Feb 2024 • Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang

In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model.

In-Context Learning

Paper
Add Code

Tensor Kernel Recovery for Spatio-Temporal Hawkes Processes

no code implementations • 24 Nov 2020 • Heejune Sheen, Xiaonan Zhu, Yao Xie

We estimate the general influence functions for spatio-temporal Hawkes processes using a tensor recovery approach by formulating the location dependent influence function that captures the influence of historical events as a tensor kernel.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.