2 code implementations • 4 Jan 2024 • Peiyuan Zhang, Guangtao Zeng, Tianduo Wang, Wei Lu
We present TinyLlama, a compact 1. 1B language model pretrained on around 1 trillion tokens for approximately 3 epochs.
1 code implementation • 7 Nov 2023 • Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, Ziwei Liu
In this paper, we present OtterHD-8B, an innovative multimodal model evolved from Fuyu-8B, specifically engineered to interpret high-resolution visual inputs with granular precision.
Ranked #85 on Visual Question Answering on MM-Vet
1 code implementation • 28 May 2023 • Guangtao Zeng, Peiyuan Zhang, Wei Lu
Fine-tuning pre-trained language models for multiple tasks tends to be expensive in terms of storage.
no code implementations • 19 Mar 2023 • Peiyuan Zhang, Jiaye Teng, Jingzhao Zhang
Our paper examines this observation by providing excess risk lower bounds for GD and SGD in two realizable settings: 1) $\eta T = \bigO{n}$, and (2) $\eta T = \bigOmega{n}$, where $n$ is the size of dataset.
1 code implementation • 25 Oct 2022 • Peiyuan Zhang, Wei Lu
Our experiments show that our approach is able to lead to improved class representations, yielding significantly better results on the few-shot relation extraction task.
no code implementations • 13 Feb 2022 • Peiyuan Zhang, Jingzhao Zhang, Suvrit Sra
Deciding whether saddle points exist or are approximable for nonconvex-nonconcave problems is usually intractable.
no code implementations • NeurIPS 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand
The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization.
no code implementations • 23 Feb 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith
Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.
no code implementations • 31 Oct 2019 • Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann
We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression.