no code implementations • Findings (EMNLP) 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one.
no code implementations • EMNLP 2021 • Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md Akmal Haidar, Ali Ghodsi
Intermediate layer matching is shown as an effective approach for improving knowledge distillation (KD).
no code implementations • Findings (EMNLP) 2021 • Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais
Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.
no code implementations • NAACL 2022 • Marzieh Tahaei, Ella Charlaix, Vahid Nia, Ali Ghodsi, Mehdi Rezagholizadeh
We push the limits of state-of-the-art Transformer-based pre-trained language model compression using Kronecker decomposition.
no code implementations • 14 Apr 2024 • Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar
Large language models (LLMs) show an innate skill for solving language based tasks.
1 code implementation • 13 Mar 2024 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk
Lastly, we show that the proposed recipe can be applied to other distillation methodologies, such as the recent DPWavLM.
1 code implementation • 29 Feb 2024 • Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu
This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences.
no code implementations • 16 Feb 2024 • Hossein Rajabzadeh, Mojtaba Valipour, Tianshu Zhu, Marzieh Tahaei, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh
Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models.
no code implementations • 3 Feb 2024 • Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi
Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses.
1 code implementation • 15 Jan 2024 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Boxing Chen
Pretraining monolingual language models have been proven to be vital for performance in Arabic Natural Language Processing (NLP) tasks.
1 code implementation • 18 Dec 2023 • Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, Jimmy Lin
We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.
no code implementations • 25 Sep 2023 • Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors.
no code implementations • 16 Sep 2023 • Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh
We extend SortedNet to generative NLP tasks, making large language models dynamic without any Pre-Training and by only replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT).
no code implementations • 1 Sep 2023 • Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi
Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous specific models.
no code implementations • 11 Jul 2023 • Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks.
no code implementations • 12 Jun 2023 • Anderson R. Avila, Mehdi Rezagholizadeh, Chao Xing
In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 11 Jun 2023 • Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar
Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP).
no code implementations • 24 May 2023 • Amirhossein Kazemnejad, Mehdi Rezagholizadeh, Prasanna Parthasarathi, Sarath Chandar
We propose a systematic framework to measure parametric knowledge utilization in PLMs.
no code implementations • 23 May 2023 • Vamsikrishna Chemudupati, Marzieh Tahaei, Heitor Guimaraes, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago Falk
Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 10 May 2023 • Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin
The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs.
no code implementations • 9 May 2023 • Heitor Guimarães, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Later, these representations serve as input to downstream models to solve a number of tasks, such as keyword spotting or emotion recognition.
no code implementations • 8 May 2023 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.
no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang
The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.
no code implementations • 18 Feb 2023 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk
The proposed layer-wise distillation recipe is evaluated on top of three well-established universal representations, as well as with three downstream tasks.
no code implementations • 27 Jan 2023 • Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi
Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.
no code implementations • 20 Dec 2022 • Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh
We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.
no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais
Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.
no code implementations • 12 Dec 2022 • Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi
Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).
no code implementations • 12 Nov 2022 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk
Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition.
1 code implementation • 18 Oct 2022 • Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, Jimmy Lin
MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.
2 code implementations • 14 Oct 2022 • Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi
Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.
no code implementations • 20 Sep 2022 • Mohammadreza Tayaranian, Alireza Ghaffari, Marzieh S. Tahaei, Mehdi Rezagholizadeh, Masoud Asgharian, Vahid Partovi Nia
Previously researchers were focused on lower bit-width integer data types for the forward propagation of language models to save memory and computation.
1 code implementation • 30 Jun 2022 • Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
We propose a general deep architecture for learning functions on multiple permutation-invariant sets.
no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi
Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.
no code implementations • 21 May 2022 • Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais
There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language.
no code implementations • COLING 2022 • Joyce Zheng, Mehdi Rezagholizadeh, Peyman Passban
To solve this problem, position embeddings are defined exclusively for each time step to enrich word information.
no code implementations • COLING 2022 • Md Akmal Haidar, Mehdi Rezagholizadeh, Abbas Ghaddar, Khalil Bibi, Philippe Langlais, Pascal Poupart
Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models.
1 code implementation • Findings (ACL) 2022 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi
From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.
1 code implementation • 8 Dec 2021 • Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais
Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.
no code implementations • 9 Nov 2021 • David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh
We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.
no code implementations • 16 Oct 2021 • Tianda Li, Yassir El Mesbahi, Ivan Kobyzev, Ahmad Rashid, Atif Mahmud, Nithin Anchuri, Habib Hajimolahoseini, Yang Liu, Mehdi Rezagholizadeh
Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks.
no code implementations • COLING 2022 • Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi
A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.
no code implementations • ACL 2022 • Ali Edalati, Marzieh Tahaei, Ahmad Rashid, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh
GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks.
no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais
Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.
no code implementations • Findings (NAACL) 2022 • Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart
To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.
no code implementations • WNUT (ACL) 2021 • Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications.
no code implementations • 13 Sep 2021 • Marzieh S. Tahaei, Ella Charlaix, Vahid Partovi Nia, Ali Ghodsi, Mehdi Rezagholizadeh
We present our KroneckerBERT, a compressed version of the BERT_BASE model obtained using this framework.
1 code implementation • 13 Sep 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one.
no code implementations • Findings (ACL) 2021 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid
Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.
no code implementations • 24 Jul 2021 • Abbas Ghaddar, Philippe Langlais, Ahmad Rashid, Mehdi Rezagholizadeh
In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity.
1 code implementation • Findings (ACL) 2021 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi
We exploit a semi-supervised approach based on KD to train a model on augmented data.
1 code implementation • ACL 2021 • Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh
We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.
no code implementations • 18 Apr 2021 • Krtin Kumar, Peyman Passban, Mehdi Rezagholizadeh, Yiu Sing Lau, Qun Liu
Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens} and the term \textit{embedding} only refers to embeddings of inputs.}
no code implementations • 17 Apr 2021 • Kira A. Selby, Yinong Wang, Ruizhe Wang, Peyman Passban, Ahmad Rashid, Mehdi Rezagholizadeh, Pascal Poupart
Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains.
1 code implementation • EACL 2021 • Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi
Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.
no code implementations • 17 Mar 2021 • Md Akmal Haidar, Chao Xing, Mehdi Rezagholizadeh
End-to-end automatic speech recognition (ASR), unlike conventional ASR, does not have modules to learn the semantic representation from speech encoder.
Ranked #12 on Speech Recognition on LibriSpeech test-clean
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 10 Mar 2021 • Md Akmal Haidar, Mehdi Rezagholizadeh
In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data.
no code implementations • EMNLP 2021 • Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi Rezagholizadeh
Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions.
no code implementations • 27 Dec 2020 • Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu
Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training.
no code implementations • 10 Nov 2020 • Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh
B-GAN is able to generate a distributed latent space representation which can be paired with an attention based decoder to generate fluent sentences.
4 code implementations • 9 Nov 2019 • Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh
While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • Findings of the Association for Computational Linguistics 2020 • Gabriele Prato, Ella Charlaix, Mehdi Rezagholizadeh
State-of-the-art neural machine translation methods employ massive amounts of parameters.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md. Akmal Haidar, Mehdi Rezagholizadeh
Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored.
no code implementations • 25 Sep 2019 • Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh
Word-embeddings are a vital component of Natural Language Processing (NLP) systems and have been extensively researched.
1 code implementation • ACL 2019 • Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung
We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach.
Ranked #2 on Text Simplification on PWKP / WikiSmall (SARI metric)
1 code implementation • 23 Apr 2019 • Md. Akmal Haidar, Mehdi Rezagholizadeh
Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization.
no code implementations • NAACL 2019 • Md. Akmal Haidar, Mehdi Rezagholizadeh, Alan Do-Omri, Ahmad Rashid
This soft representation will be used in GAN discrimination to synthesize similar soft-texts.
no code implementations • WS 2019 • Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh
Latent space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respectively.
no code implementations • 13 Nov 2018 • Mehdi Rezagholizadeh, Md Akmal Haidar
We performed several experiments on a publicly available driving dataset to evaluate our proposed method, and the results are very promising.
no code implementations • 28 Sep 2018 • Jules Gagnon-Marchand, Hamed Sadeghi, Md. Akmal Haidar, Mehdi Rezagholizadeh
Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent code- based schemes in text generation.