no code implementations • 19 Jul 2023 • James O' Neill, Sourav Dutta
We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise.
no code implementations • 12 Jul 2023 • James O' Neill, Sourav Dutta
We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models.
no code implementations • Findings (ACL) 2022 • James O' Neill, Sourav Dutta, Haytham Assem
While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria.
no code implementations • 30 Sep 2021 • James O' Neill, Sourav Dutta, Haytham Assem
Pruning aims to reduce the number of parameters while maintaining performance close to the original network.
no code implementations • 29 Sep 2021 • James O' Neill, Sourav Dutta, Haytham Assem
Pruning aims to reduce the number of parameters while maintaining performance close to the original network.
no code implementations • 12 Feb 2021 • James O' Neill, Danushka Bollegala
In the knowledge distillation setting, (1) the performance of student networks increase by 4. 56\% percentage points on Tiny-ImageNet-200 and 3. 29\% on CIFAR-100 over student networks trained with no teacher and (2) 1. 23\% and 1. 72\% respectively over a \textit{hard-to-beat} baseline (Hinton et al., 2015).
no code implementations • 22 Jan 2021 • James O' Neill, Danushka Bollegala
At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past targets that are provided during training.
no code implementations • 29 Jul 2020 • James O' Neill, Greg Ver Steeg, Aram Galstyan
This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers.
no code implementations • 5 Jun 2020 • James O' Neill
Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer.
no code implementations • 9 Sep 2019 • James O' Neill, Danushka Bollegala
However, we argue that current n-gram overlap based measures that are used as rewards can be improved by using model-based rewards transferred from tasks that directly compare the similarity of sentence pairs.
no code implementations • 24 Mar 2019 • James O' Neill
However, transferring all parameters, some of which irrelevant for a target task, can lead to sub-optimal results and can have a negative effect on performance, referred to as \textit{negative} transfer.
no code implementations • 21 Jan 2019 • James O' Neill, Danushka Bollegala
We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance.
no code implementations • 2 Nov 2018 • James O' Neill, Danushka Bollegala
Moreover, we propose an extension of variational dropout to concrete dropout and curriculum dropout with varying schedules.
no code implementations • 16 Sep 2018 • James O' Neill, Danushka Bollegala
At test time, a language model is required to make predictions given past predictions as input, instead of the past targets that are provided during training.
no code implementations • 16 Sep 2018 • James O' Neill, Danushka Bollegala
For intrinsic task evaluation, supervision comes from various labeled word similarity datasets.
no code implementations • 13 Aug 2018 • James O' Neill, Danushka Bollegala
This work compares meta-embeddings trained for different losses, namely loss functions that account for angular distance between the reconstructed embedding and the target and those that account normalized distances based on the vector length.
no code implementations • ICLR 2019 • James O' Neill
Capsule Networks have shown encouraging results on \textit{defacto} benchmark computer vision datasets such as MNIST, CIFAR and smallNORB.
no code implementations • 23 Apr 2018 • James O' Neill, Danushka Bollegala
We also compare against models that are fully trained on the target task in the standard supervised learning setup.