no code implementations • 3 Apr 2024 • Yichuan Deng, Zhao Song, Chiwun Yang
The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.
no code implementations • 2 Feb 2024 • Yichuan Deng, Zhao Song, Chiwun Yang
Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.
no code implementations • 24 Nov 2023 • Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang
Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$.
no code implementations • 22 Nov 2023 • Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang
The Deep Leakage from Gradient (DLG) attack has emerged as a prevalent and highly effective method for extracting sensitive training data by inspecting exchanged gradients.
no code implementations • 19 Oct 2023 • Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang
In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.
no code implementations • 17 Oct 2023 • Zhao Song, Chiwun Yang
The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates.
no code implementations • 5 Oct 2023 • Timothy Chu, Zhao Song, Chiwun Yang
To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning).
no code implementations • 23 Aug 2023 • Timothy Chu, Zhao Song, Chiwun Yang
Large language models (LLMs) and generative AI have played a transformative role in computer research and applications.