Search Results for author: Mehdi Rezagholizadeh

Found 70 papers, 16 papers with code

How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

no code implementations • Findings (EMNLP) 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one.

Adversarial Robustness Data Augmentation +4

Paper
Add Code

Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation

no code implementations • EMNLP 2021 • Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md Akmal Haidar, Ali Ghodsi

Intermediate layer matching is shown as an effective approach for improving knowledge distillation (KD).

Knowledge Distillation

Paper
Add Code

RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

no code implementations • Findings (EMNLP) 2021 • Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais

Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.

Knowledge Distillation

Paper
Add Code

KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation

no code implementations • NAACL 2022 • Marzieh Tahaei, Ella Charlaix, Vahid Nia, Ali Ghodsi, Mehdi Rezagholizadeh

We push the limits of state-of-the-art Transformer-based pre-trained language model compression using Kronecker decomposition.

Knowledge Distillation Language Modelling +1

Paper
Add Code

Towards Practical Tool Usage for Continually Learning LLMs

no code implementations • 14 Apr 2024 • Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar

Large language models (LLMs) show an innate skill for solving language based tasks.

Continual Learning

Paper
Add Code

An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

1 code implementation • 13 Mar 2024 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

Lastly, we show that the proposed recipe can be applied to other distillation methodologies, such as the recent DPWavLM.

Denoising Knowledge Distillation +2

Paper
Code

Resonance RoPE: Improving Context Length Generalization of Large Language Models

1 code implementation • 29 Feb 2024 • Suyuchen Wang, Ivan Kobyzev, Peng Lu, Mehdi Rezagholizadeh, Bang Liu

This paper addresses the challenge of train-short-test-long (TSTL) scenarios in Large Language Models (LLMs) equipped with Rotary Position Embedding (RoPE), where models pre-trained on shorter sequences face difficulty with out-of-distribution (OOD) token positions in longer sequences.

Language Modelling Position

Paper
Code

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning

no code implementations • 16 Feb 2024 • Hossein Rajabzadeh, Mojtaba Valipour, Tianshu Zhu, Marzieh Tahaei, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models.

Language Modelling Large Language Model +1

Paper
Add Code

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

no code implementations • 3 Feb 2024 • Xindi Wang, Mahsa Salmani, Parsa Omidi, Xiangyu Ren, Mehdi Rezagholizadeh, Armaghan Eshaghi

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses.

Logical Reasoning Long-Context Understanding

Paper
Add Code

On the importance of Data Scale in Pretraining Arabic Language Models

1 code implementation • 15 Jan 2024 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Boxing Chen

Pretraining monolingual language models have been proven to be vital for performance in Arabic Natural Language Processing (NLP) tasks.

Decoder Language Modelling

2,963

Paper
Code

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

1 code implementation • 18 Dec 2023 • Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, Jimmy Lin

We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.

Hallucination Language Modelling +2

Paper
Code

On the Impact of Quantization and Pruning of Self-Supervised Speech Models for Downstream Speech Recognition Tasks "In-the-Wild''

no code implementations • 25 Sep 2023 • Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors.

Data Augmentation Model Compression +4

Paper
Add Code

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

no code implementations • 16 Sep 2023 • Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

We extend SortedNet to generative NLP tasks, making large language models dynamic without any Pre-Training and by only replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT).

Instruction Following Question Answering +1

Paper
Add Code

SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

no code implementations • 1 Sep 2023 • Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi

Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous specific models.

Image Classification Model Selection

Paper
Add Code

Attribute Controlled Dialogue Prompting

no code implementations • 11 Jul 2023 • Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks.

Attribute Dialogue Generation

Paper
Add Code

Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

no code implementations • 12 Jun 2023 • Anderson R. Avila, Mehdi Rezagholizadeh, Chao Xing

In this work, we investigate impacts of this ASR error propagation on state-of-the-art NLU systems based on pre-trained language models (PLM), such as BERT and RoBERTa.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

no code implementations • 11 Jun 2023 • Asaad Alghamdi, Xinyu Duan, Wei Jiang, Zhenhai Wang, Yimeng Wu, Qingrong Xia, Zhefeng Wang, Yi Zheng, Mehdi Rezagholizadeh, Baoxing Huai, Peilun Cheng, Abbas Ghaddar

Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP).

Few-Shot Learning

Paper
Add Code

Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models

no code implementations • 24 May 2023 • Amirhossein Kazemnejad, Mehdi Rezagholizadeh, Prasanna Parthasarathi, Sarath Chandar

We propose a systematic framework to measure parametric knowledge utilization in PLMs.

Paper
Add Code

On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications

no code implementations • 23 May 2023 • Vamsikrishna Chemudupati, Marzieh Tahaei, Heitor Guimaraes, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago Falk

Large self-supervised pre-trained speech models have achieved remarkable success across various speech-processing tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Evaluating Embedding APIs for Information Retrieval

no code implementations • 10 May 2023 • Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin

The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs.

Domain Generalization Information Retrieval +2

Paper
Add Code

An Exploration into the Performance of Unsupervised Cross-Task Speech Representations for "In the Wild'' Edge Applications

no code implementations • 9 May 2023 • Heitor Guimarães, Arthur Pimentel, Anderson Avila, Mehdi Rezagholizadeh, Tiago H. Falk

Later, these representations serve as input to downstream models to solve a number of tasks, such as keyword spotting or emotion recognition.

Emotion Recognition intent-classification +2

Paper
Add Code

LABO: Towards Learning Optimal Label Regularization via Bi-level Optimization

no code implementations • 8 May 2023 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Label Smoothing (LS) is another simple, versatile and efficient regularization which can be applied to various supervised classification tasks.

Image Classification Machine Translation

Paper
Add Code

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

no code implementations • 18 Feb 2023 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

The proposed layer-wise distillation recipe is evaluated on top of three well-established universal representations, as well as with three downstream tasks.

Knowledge Distillation Multi-Task Learning

Paper
Add Code

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

no code implementations • 27 Jan 2023 • Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.

Knowledge Distillation Model Compression

Paper
Add Code

KronA: Parameter Efficient Tuning with Kronecker Adapter

no code implementations • 20 Dec 2022 • Ali Edalati, Marzieh Tahaei, Ivan Kobyzev, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

We apply the proposed methods for fine-tuning T5 on the GLUE benchmark to show that incorporating the Kronecker-based modules can outperform state-of-the-art PET methods.

Language Modelling

Paper
Add Code

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais

Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

Knowledge Distillation Question Answering +2

Paper
Add Code

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

no code implementations • 12 Dec 2022 • Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).

Knowledge Distillation Natural Language Understanding

Paper
Add Code

Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

no code implementations • 12 Nov 2022 • Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition.

Data Augmentation Emotion Recognition +2

Paper
Add Code

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

1 code implementation • 18 Oct 2022 • Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, Jimmy Lin

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.

Information Retrieval Retrieval

135

Paper
Code

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

2 code implementations • 14 Oct 2022 • Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi

Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.

Natural Language Understanding Text Generation

1,984

Paper
Code

Towards Fine-tuning Pre-trained Language Models with Integer Forward and Backward Propagation

no code implementations • 20 Sep 2022 • Mohammadreza Tayaranian, Alireza Ghaffari, Marzieh S. Tahaei, Mehdi Rezagholizadeh, Masoud Asgharian, Vahid Partovi Nia

Previously researchers were focused on lower bit-width integer data types for the forward propagation of language models to save memory and computation.

Paper
Add Code

Learning Functions on Multiple Sets using Multi-Set Transformers

1 code implementation • 30 Jun 2022 • Kira Selby, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart

We propose a general deep architecture for learning functions on multiple permutation-invariant sets.

Paper
Code

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Paper
Add Code

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

no code implementations • 21 May 2022 • Abbas Ghaddar, Yimeng Wu, Sunyam Bagga, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

There is a growing body of work in recent years to develop pre-trained language models (PLMs) for the Arabic language.

Natural Language Understanding

Paper
Add Code

Dynamic Position Encoding for Transformers

no code implementations • COLING 2022 • Joyce Zheng, Mehdi Rezagholizadeh, Peyman Passban

To solve this problem, position embeddings are defined exclusively for each time step to enrich word information.

Machine Translation NMT +1

Paper
Add Code

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

no code implementations • COLING 2022 • Md Akmal Haidar, Mehdi Rezagholizadeh, Abbas Ghaddar, Khalil Bibi, Philippe Langlais, Pascal Poupart

Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models.

Contrastive Learning Data Augmentation +1

Paper
Add Code

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

1 code implementation • Findings (ACL) 2022 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi

From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.

Data Augmentation Knowledge Distillation

Paper
Code

JABER and SABER: Junior and Senior Arabic BERt

1 code implementation • 8 Dec 2021 • Abbas Ghaddar, Yimeng Wu, Ahmad Rashid, Khalil Bibi, Mehdi Rezagholizadeh, Chao Xing, Yasheng Wang, Duan Xinyu, Zhefeng Wang, Baoxing Huai, Xin Jiang, Qun Liu, Philippe Langlais

Language-specific pre-trained models have proven to be more accurate than multilingual ones in a monolingual evaluation setting, Arabic is no exception.

Language Modelling NER

2,963

Paper
Code

NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation

no code implementations • 9 Nov 2021 • David Alfonso-Hermelo, Ahmad Rashid, Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh

We apply NATURE to common slot-filling and intent detection benchmarks and demonstrate that simple perturbations from the standard evaluation set by NATURE can deteriorate model performance significantly.

Intent Detection slot-filling +1

Paper
Add Code

A Short Study on Compressing Decoder-Based Language Models

no code implementations • 16 Oct 2021 • Tianda Li, Yassir El Mesbahi, Ivan Kobyzev, Ahmad Rashid, Atif Mahmud, Nithin Anchuri, Habib Hajimolahoseini, Yang Liu, Mehdi Rezagholizadeh

Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks.

Decoder Knowledge Distillation +1

Paper
Add Code

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

no code implementations • COLING 2022 • Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi

A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.

Image Classification Knowledge Distillation +3

Paper
Add Code

Kronecker Decomposition for GPT Compression

no code implementations • ACL 2022 • Ali Edalati, Marzieh Tahaei, Ahmad Rashid, Vahid Partovi Nia, James J. Clark, Mehdi Rezagholizadeh

GPT is an auto-regressive Transformer-based pre-trained language model which has attracted a lot of attention in the natural language processing (NLP) domain due to its state-of-the-art performance in several downstream tasks.

Knowledge Distillation Language Modelling +1

Paper
Add Code

Pseudo Knowledge Distillation: Towards Learning Optimal Instance-specific Label Smoothing Regularization

no code implementations • 29 Sep 2021 • Peng Lu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Philippe Langlais

Knowledge Distillation (KD) is an algorithm that transfers the knowledge of a trained, typically larger, neural network into another model under training.

Image Classification Knowledge Distillation +1

Paper
Add Code

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

no code implementations • Findings (NAACL) 2022 • Md Akmal Haidar, Nithin Anchuri, Mehdi Rezagholizadeh, Abbas Ghaddar, Philippe Langlais, Pascal Poupart

To address these problems, we propose a RAndom Intermediate Layer Knowledge Distillation (RAIL-KD) approach in which, intermediate layers from the teacher model are selected randomly to be distilled into the intermediate layers of the student model.

Knowledge Distillation

Paper
Add Code

Knowledge Distillation with Noisy Labels for Natural Language Understanding

no code implementations • WNUT (ACL) 2021 • Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications.

Knowledge Distillation Natural Language Understanding

Paper
Add Code

KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

no code implementations • 13 Sep 2021 • Marzieh S. Tahaei, Ella Charlaix, Vahid Partovi Nia, Ali Ghodsi, Mehdi Rezagholizadeh

We present our KroneckerBERT, a compressed version of the BERT_BASE model obtained using this framework.

Knowledge Distillation Language Modelling +1

Paper
Add Code

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

1 code implementation • 13 Sep 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one.

Adversarial Robustness Data Augmentation +4

Paper
Code

End-to-End Self-Debiasing Framework for Robust NLU Training

no code implementations • Findings (ACL) 2021 • Abbas Ghaddar, Philippe Langlais, Mehdi Rezagholizadeh, Ahmad Rashid

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones.

Natural Language Understanding

Paper
Add Code

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

no code implementations • 24 Jul 2021 • Abbas Ghaddar, Philippe Langlais, Ahmad Rashid, Mehdi Rezagholizadeh

In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity.

Data Augmentation named-entity-recognition +2

Paper
Add Code

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

1 code implementation • Findings (ACL) 2021 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi

We exploit a semi-supervised approach based on KD to train a model on augmented data.

Data Augmentation Knowledge Distillation +2

Paper
Code

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

1 code implementation • ACL 2021 • Ahmad Rashid, Vasileios Lioutas, Mehdi Rezagholizadeh

We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.

Adversarial Text Data Augmentation +2

Paper
Code

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables

no code implementations • 18 Apr 2021 • Krtin Kumar, Peyman Passban, Mehdi Rezagholizadeh, Yiu Sing Lau, Qun Liu

Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens} and the term \textit{embedding} only refers to embeddings of inputs.}

Machine Translation NMT +2

Paper
Add Code

Robust Embeddings Via Distributions

no code implementations • 17 Apr 2021 • Kira A. Selby, Yinong Wang, Ruizhe Wang, Peyman Passban, Ahmad Rashid, Mehdi Rezagholizadeh, Pascal Poupart

Despite recent monumental advances in the field, many Natural Language Processing (NLP) models still struggle to perform adequately on noisy domains.

Paper
Add Code

Annealing Knowledge Distillation

1 code implementation • EACL 2021 • Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.

Image Classification Knowledge Distillation +1

Paper
Code

Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation

no code implementations • 17 Mar 2021 • Md Akmal Haidar, Chao Xing, Mehdi Rezagholizadeh

End-to-end automatic speech recognition (ASR), unlike conventional ASR, does not have modules to learn the semantic representation from speech encoder.

Ranked #12 on Speech Recognition on LibriSpeech test-clean

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

no code implementations • 10 Mar 2021 • Md Akmal Haidar, Mehdi Rezagholizadeh

In this paper, we introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective where the ASR model acts as a generator and a discriminator tries to distinguish the ASR output from the real data.

speech-recognition Speech Recognition

Paper
Add Code

Towards Zero-Shot Knowledge Distillation for Natural Language Processing

no code implementations • EMNLP 2021 • Ahmad Rashid, Vasileios Lioutas, Abbas Ghaddar, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions.

Knowledge Distillation Model Compression +1

Paper
Add Code

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

no code implementations • 27 Dec 2020 • Peyman Passban, Yimeng Wu, Mehdi Rezagholizadeh, Qun Liu

Knowledge distillation is considered as a training and compression strategy in which two neural networks, namely a teacher and a student, are coupled together during training.

Knowledge Distillation

Paper
Add Code

From Unsupervised Machine Translation To Adversarial Text Generation

no code implementations • 10 Nov 2020 • Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh

B-GAN is able to generate a distributed latent space representation which can be paired with an attention based decoder to generate fluent sentences.

Adversarial Text Decoder +3

Paper
Add Code

A Simplified Fully Quantized Transformer for End-to-end Speech Recognition

4 code implementations • 9 Nov 2019 • Alex Bie, Bharat Venkitesh, Joao Monteiro, Md. Akmal Haidar, Mehdi Rezagholizadeh

While significant improvements have been made in recent years in terms of end-to-end automatic speech recognition (ASR) performance, such improvements were obtained through the use of very large neural networks, unfit for embedded use on edge devices.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Fully Quantized Transformer for Machine Translation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Gabriele Prato, Ella Charlaix, Mehdi Rezagholizadeh

State-of-the-art neural machine translation methods employ massive amounts of parameters.

Machine Translation Quantization +1

Paper
Add Code

Improving Word Embedding Factorization for Compression Using Distilled Nonlinear Neural Decomposition

no code implementations • Findings of the Association for Computational Linguistics 2020 • Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md. Akmal Haidar, Mehdi Rezagholizadeh

Word-embeddings are vital components of Natural Language Processing (NLP) models and have been extensively explored.

Knowledge Distillation Language Modelling +3

Paper
Add Code

Distilled embedding: non-linear embedding factorization using knowledge distillation

no code implementations • 25 Sep 2019 • Vasileios Lioutas, Ahmad Rashid, Krtin Kumar, Md Akmal Haidar, Mehdi Rezagholizadeh

Word-embeddings are a vital component of Natural Language Processing (NLP) systems and have been extensively researched.

Knowledge Distillation Machine Translation +2

Paper
Add Code

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

1 code implementation • ACL 2019 • Yue Dong, Zichao Li, Mehdi Rezagholizadeh, Jackie Chi Kit Cheung

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-interpreter approach.

Ranked #2 on Text Simplification on PWKP / WikiSmall (SARI metric)

Machine Translation Sentence +2

Paper
Code

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks

1 code implementation • 23 Apr 2019 • Md. Akmal Haidar, Mehdi Rezagholizadeh

Text generation is of particular interest in many NLP applications such as machine translation, language modeling, and text summarization.

Image Generation Knowledge Distillation +5

Paper
Code

Latent Code and Text-based Generative Adversarial Networks for Soft-text Generation

no code implementations • NAACL 2019 • Md. Akmal Haidar, Mehdi Rezagholizadeh, Alan Do-Omri, Ahmad Rashid

This soft representation will be used in GAN discrimination to synthesize similar soft-texts.

Text Generation

Paper
Add Code

Bilingual-GAN: A Step Towards Parallel Text Generation

no code implementations • WS 2019 • Ahmad Rashid, Alan Do-Omri, Md. Akmal Haidar, Qun Liu, Mehdi Rezagholizadeh

Latent space based GAN methods and attention based sequence to sequence models have achieved impressive results in text generation and unsupervised machine translation respectively.

Decoder Denoising +3

Paper
Add Code

Semi-Supervised Regression with Generative Adverserial Networks for End to End Learning in Autonomous Driving

no code implementations • 13 Nov 2018 • Mehdi Rezagholizadeh, Md Akmal Haidar

We performed several experiments on a publicly available driving dataset to evaluate our proposed method, and the results are very promising.

Autonomous Driving regression

Paper
Add Code

SALSA-TEXT : self attentive latent space based adversarial text generation

no code implementations • 28 Sep 2018 • Jules Gagnon-Marchand, Hamed Sadeghi, Md. Akmal Haidar, Mehdi Rezagholizadeh

Inspired by the success of self attention mechanism and Transformer architecture in sequence transduction and image generation applications, we propose novel self attention-based architectures to improve the performance of adversarial latent code- based schemes in text generation.

Adversarial Text Image Generation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.