no code implementations • 10 Apr 2024 • Thomas Merth, Qichen Fu, Mohammad Rastegari, Mahyar Najibi
Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts.
no code implementations • 15 Dec 2023 • Chien-Yu Lin, Qichen Fu, Thomas Merth, Karren Yang, Anurag Ranjan
Compared to existing NeRF+SR methods, our pipeline mitigates the SR computing overhead and can be trained up to 23x faster, making it feasible to run on consumer devices such as the Apple MacBook.
no code implementations • 2 Oct 2023 • Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang
We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally displaced and hence one requires merely "inference re-direction" with input-side augmentation such as prompting, to recover the knowledge-related performance.
no code implementations • 8 Sep 2023 • Elvis Nunez, Thomas Merth, Anish Prabhu, Mehrdad Farajtabar, Mohammad Rastegari, Sachin Mehta, Maxwell Horton
Multi-scale resolution training has seen an increased adoption across multiple vision tasks, including classification and detection.
1 code implementation • 21 Jul 2022 • Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton, Mohammad Rastegari
In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN).