Search Results for author: Chiwun Yang

Found 8 papers, 0 papers with code

Attention is Naturally Sparse with Gaussian Distributed Input

no code implementations3 Apr 2024 Yichuan Deng, Zhao Song, Chiwun Yang

The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.

Computational Efficiency

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

no code implementations2 Feb 2024 Yichuan Deng, Zhao Song, Chiwun Yang

Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.

Stochastic Optimization

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

no code implementations24 Nov 2023 Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.

Attribute

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

no code implementations22 Nov 2023 Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang

The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.

Privacy Preserving

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

no code implementations19 Oct 2023 Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

no code implementations17 Oct 2023 Zhao Song, Chiwun Yang

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.

Scheduling

Fine-tune Language Models to Approximate Unbiased In-context Learning

no code implementations5 Oct 2023 Timothy Chu, Zhao Song, Chiwun Yang

To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).

In-Context Learning

How to Protect Copyright Data in Optimization of Large Language Models?

no code implementations23 Aug 2023 Timothy Chu, Zhao Song, Chiwun Yang

Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.

Language Modelling Large Language Model +1

Cannot find the paper you are looking for? You can Submit a new open access paper.