no code implementations • 19 Feb 2024 • Baohao Liao, Christof Monz
With the growing size of large language models, the role of quantization becomes increasingly significant.
no code implementations • 7 Feb 2024 • Baohao Liao, Christof Monz
Memory-efficient finetuning of large language models (LLMs) has recently attracted huge attention with the increasing size of LLMs, primarily due to the constraints posed by GPU memory limitations and the comparable results of these methods with full finetuning.
no code implementations • 22 Oct 2023 • Baohao Liao, Michael Kozielski, Sanjika Hewavitharana, Jiangbo Yuan, Shahram Khadivi, Tomer Lancewicki
How to teach a model to learn embedding from different modalities without neglecting information from the less dominant modality is challenging.
no code implementations • 20 Oct 2023 • Quinten Bolding, Baohao Liao, Brandon James Denis, Jun Luo, Christof Monz
Lastly, experiments on C-MTNT showcased its effectiveness in evaluating the robustness of NMT models, highlighting the potential of advanced language models for data cleaning and emphasizing C-MTNT as a valuable resource.
1 code implementation • NeurIPS 2023 • Baohao Liao, Shaomu Tan, Christof Monz
One effective way to reduce the activation memory is to apply a reversible model, so the intermediate activations are not necessary to be cached and can be recomputed.
no code implementations • 26 May 2023 • Baohao Liao, Yan Meng, Christof Monz
Parameter-efficient fine-tuning (PEFT) of pre-trained language models has recently demonstrated remarkable achievements, effectively matching the performance of full fine-tuning while utilizing significantly fewer trainable parameters, and consequently addressing the storage and communication constraints.
1 code implementation • 9 Nov 2022 • Baohao Liao, David Thulke, Sanjika Hewavitharana, Hermann Ney, Christof Monz
We show: (1) [MASK]s can indeed be appended at a later layer, being disentangled from the word embedding; (2) The gathering of contextualized information from unmasked tokens can be conducted with a few layers.
no code implementations • 28 Oct 2022 • Mathis Bode, Michael Gauding, Jens Henrik Göbbert, Baohao Liao, Jenia Jitsev, Heinz Pitsch
In this paper, deep learning (DL) methods are evaluated in the context of turbulent flows.
1 code implementation • WMT (EMNLP) 2021 • Baohao Liao, Shahram Khadivi, Sanjika Hewavitharana
Surprisingly, the smaller size of vocabularies perform better, and the extensive monolingual English data offers a modest improvement.
no code implementations • COLING 2020 • Yingbo Gao, Baohao Liao, Hermann Ney
Soft contextualized data augmentation is a recent method that replaces one-hot representation of words with soft posterior distributions of an external language model, smoothing the input of neural machine translation systems.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Baohao Liao, Yingbo Gao, Hermann Ney
Mutual learning, where multiple agents learn collaboratively and teach one another, has been shown to be an effective way to distill knowledge for image classification tasks.