Search Results for author: Chiwun Yang

Found 8 papers, 0 papers with code

Attention is Naturally Sparse with Gaussian Distributed Input

no code implementations • 3 Apr 2024 • Yichuan Deng, Zhao Song, Chiwun Yang

The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.

Computational Efficiency

Paper
Add Code

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

no code implementations • 2 Feb 2024 • Yichuan Deng, Zhao Song, Chiwun Yang

Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.

Stochastic Optimization

Paper
Add Code

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

no code implementations • 24 Nov 2023 • Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang

Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.

Attribute

Paper
Add Code

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

no code implementations • 22 Nov 2023 • Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang

The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.

Privacy Preserving

Paper
Add Code

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

no code implementations • 19 Oct 2023 • Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.

Paper
Add Code

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

no code implementations • 17 Oct 2023 • Zhao Song, Chiwun Yang

The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.

Scheduling

Paper
Add Code

Fine-tune Language Models to Approximate Unbiased In-context Learning

no code implementations • 5 Oct 2023 • Timothy Chu, Zhao Song, Chiwun Yang

To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).

In-Context Learning

Paper
Add Code

How to Protect Copyright Data in Optimization of Large Language Models?

no code implementations • 23 Aug 2023 • Timothy Chu, Zhao Song, Chiwun Yang

Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.

Language Modelling Large Language Model +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.