Search Results for author: Michal Lukasik

Found 24 papers, 3 papers with code

Metric-aware LLM inference for regression and scoring

no code implementations • 7 Mar 2024 • Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Large language models (LLMs) have demonstrated strong results on a range of NLP tasks.

regression

Paper
Add Code

It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models

no code implementations • 13 Oct 2023 • Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}.

Paper
Add Code

What do larger image classifiers memorise?

no code implementations • 9 Oct 2023 • Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.

Image Classification Knowledge Distillation +2

Paper
Add Code

Large Language Models with Controllable Working Memory

no code implementations • 9 Nov 2022 • Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.

counterfactual World Knowledge

Paper
Add Code

Two-stage LLM Fine-tuning with Less Specialization and More Generalization

no code implementations • 1 Nov 2022 • Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts.

Binary Classification Domain Generalization +5

Paper
Add Code

Robust Distillation for Worst-class Performance

no code implementations • 13 Jun 2022 • Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.

Knowledge Distillation

Paper
Add Code

HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

no code implementations • 28 Oct 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

In this paper, we study training and inference of neural networks under the MPC setup.

Paper
Add Code

Leveraging redundancy in attention with Reuse Transformers

1 code implementation • 13 Oct 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision.

76,596

Paper
Code

HD-cos Networks: Efficient Neural Architechtures for Secure Multi-Party Computation

no code implementations • 29 Sep 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

In this paper, we study training and inference of neural networks under the MPC setup.

Paper
Add Code

Teacher's pet: understanding and mitigating biases in distillation

no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Image Classification Knowledge Distillation

Paper
Add Code

Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

no code implementations • 16 Jun 2021 • Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length.

Paper
Add Code

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Paper
Add Code

Scaling Graph Neural Networks with Approximate PageRank

2 code implementations • 3 Jul 2020 • Aleksandar Bojchevski, Johannes Gasteiger, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, Stephan Günnemann

Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks.

Graph Learning Node Classification

124

Paper
Code

Text Segmentation by Cross Segment Attention

1 code implementation • EMNLP 2020 • Michal Lukasik, Boris Dadachev, Gonçalo Simões, Kishore Papineni

Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization.

Discourse Segmentation Information Retrieval +4

Paper
Code

Does label smoothing mitigate label noise?

no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Ranked #12 on Learning with noisy labels on CIFAR-10N-Random3

Learning with noisy labels

Paper
Add Code

Content Explorer: Recommending Novel Entities for a Document Writer

no code implementations • EMNLP 2018 • Michal Lukasik, Richard Zens

Background research is an essential part of document writing.

Extreme Multi-Label Classification General Classification

Paper
Add Code

Discourse-Aware Rumour Stance Classification in Social Media Using Sequential Classifiers

no code implementations • 6 Dec 2017 • Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein

We show that sequential classifiers that exploit the use of discourse properties in social media conversations while using only local features, outperform non-sequential classifiers.

General Classification Stance Classification

Paper
Add Code

Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations

no code implementations • COLING 2016 • Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik

Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest.

General Classification Rumour Detection +1