Search Results for author: James O' Neill

Found 18 papers, 0 papers with code

Gradient Sparsification For Masked Fine-Tuning of Transformers

no code implementations • 19 Jul 2023 • James O' Neill, Sourav Dutta

We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise.

Transfer Learning

Paper
Add Code

Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

no code implementations • 12 Jul 2023 • James O' Neill, Sourav Dutta

We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models.

Quantization XLM-R

Paper
Add Code

Aligned Weight Regularizers for Pruning Pretrained Neural Networks

no code implementations • Findings (ACL) 2022 • James O' Neill, Sourav Dutta, Haytham Assem

While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria.

Language Modelling Model Compression

Paper
Add Code

Deep Neural Compression Via Concurrent Pruning and Self-Distillation

no code implementations • 30 Sep 2021 • James O' Neill, Sourav Dutta, Haytham Assem

Pruning aims to reduce the number of parameters while maintaining performance close to the original network.

Knowledge Distillation Language Modelling

Paper
Add Code

Self-Distilled Pruning Of Neural Networks

no code implementations • 29 Sep 2021 • James O' Neill, Sourav Dutta, Haytham Assem

Pruning aims to reduce the number of parameters while maintaining performance close to the original network.

Knowledge Distillation Language Modelling

Paper
Add Code

Semantically-Conditioned Negative Samples for Efficient Contrastive Learning

no code implementations • 12 Feb 2021 • James O' Neill, Danushka Bollegala

In the knowledge distillation setting, (1) the performance of student networks increase by 4. 56\% percentage points on Tiny-ImageNet-200 and 3. 29\% on CIFAR-100 over student networks trained with no teacher and (2) 1. 23\% and 1. 72\% respectively over a \textit{hard-to-beat} baseline (Hinton et al., 2015).

Contrastive Learning Knowledge Distillation

Paper
Add Code

$k$-Neighbor Based Curriculum Sampling for Sequence Prediction

no code implementations • 22 Jan 2021 • James O' Neill, Danushka Bollegala

At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training.

Language Modelling

Paper
Add Code

Compressing Deep Neural Networks via Layer Fusion

no code implementations • 29 Jul 2020 • James O' Neill, Greg Ver Steeg, Aram Galstyan

This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers.

Exponential degradation Language Modelling +1

Paper
Add Code

An Overview of Neural Network Compression

no code implementations • 5 Jun 2020 • James O' Neill

Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer.

Knowledge Distillation Neural Network Compression +2

Paper
Add Code

Transfer Reward Learning for Policy Gradient-Based Text Generation

no code implementations • 9 Sep 2019 • James O' Neill, Danushka Bollegala

However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs.

Conditional Text Generation Image Captioning +5

Paper
Add Code

Learning To Avoid Negative Transfer in Few Shot Transfer Learning

no code implementations • 24 Mar 2019 • James O' Neill

However, transferring all parameters, some of which irrelevant for a target task, can lead to sub-optimal results and can have a negative effect on performance, referred to as \textit{negative} transfer.

Few-Shot Learning Natural Language Inference +2

Paper
Add Code

Error-Correcting Neural Sequence Prediction

no code implementations • 21 Jan 2019 • James O' Neill, Danushka Bollegala

We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance.

Image Captioning Language Modelling +1

Paper
Add Code

Analysing Dropout and Compounding Errors in Neural Language Models

no code implementations • 2 Nov 2018 • James O' Neill, Danushka Bollegala

Moreover, we propose an extension of variational dropout to concrete dropout and curriculum dropout with varying schedules.

Decoder Language Modelling

Paper
Add Code

Curriculum-Based Neighborhood Sampling For Sequence Prediction

no code implementations • 16 Sep 2018 • James O' Neill, Danushka Bollegala

At test time, a language model is required to make predictions given past predictions as input, instead of the past targets that are provided during training.

Language Modelling

Paper
Add Code

Meta-Embedding as Auxiliary Task Regularization

no code implementations • 16 Sep 2018 • James O' Neill, Danushka Bollegala

For intrinsic task evaluation, supervision comes from various labeled word similarity datasets.

Self-Supervised Learning Sentence +3

Paper
Add Code

Angular-Based Word Meta-Embedding Learning

no code implementations • 13 Aug 2018 • James O' Neill, Danushka Bollegala

This work compares meta-embeddings trained for different losses, namely loss functions that account for angular distance between the reconstructed embedding and the target and those that account normalized distances based on the vector length.

Meta-Learning Word Embeddings +1