Search Results for author: Manzil Zaheer

Found 88 papers, 34 papers with code

Incremental Extractive Opinion Summarization Using Cover Trees

1 code implementation16 Jan 2024 Somnath Basu Roy Chowdhury, Nicholas Monath, Avinava Dubey, Manzil Zaheer, Andrew McCallum, Amr Ahmed, Snigdha Chaturvedi

In this work, we study the task of extractive opinion summarization in an incremental setting, where the underlying review set evolves over time.

Extractive Summarization Opinion Summarization

Functional Interpolation for Relative Positions Improves Long Context Transformers

no code implementations6 Oct 2023 Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli

Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.

Language Modelling Position

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

no code implementations26 Jul 2023 Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks.

Program Synthesis

Machine Reading Comprehension using Case-based Reasoning

no code implementations24 May 2023 Dung Thai, Dhruv Agarwal, Mudit Chaudhary, Wenlong Zhao, Rajarshi Das, Manzil Zaheer, Jay-Yoon Lee, Hannaneh Hajishirzi, Andrew McCallum

Given a test question, CBR-MRC first retrieves a set of similar cases from a nonparametric memory and then predicts an answer by selecting the span in the test context that is most similar to the contextualized representations of answers in the retrieved cases.

Attribute Machine Reading Comprehension

Efficient k-NN Search with Cross-Encoders using Adaptive Multi-Round CUR Decomposition

1 code implementation4 May 2023 Nishant Yadav, Nicholas Monath, Manzil Zaheer, Andrew McCallum

While ANNCUR's one-time selection of anchors tends to approximate the cross-encoder distances on average, doing so forfeits the capacity to accurately estimate distances to items near the query, leading to regret in the crucial end-task: recall of top-k items.

Retrieval

Improving Dual-Encoder Training through Dynamic Indexes for Negative Mining

no code implementations27 Mar 2023 Nicholas Monath, Manzil Zaheer, Kelsey Allen, Andrew McCallum

First, we introduce an algorithm that uses a tree structure to approximate the softmax with provable bounds and that dynamically maintains the tree.

Retrieval

Multi-Task Off-Policy Learning from Bandit Feedback

no code implementations9 Dec 2022 Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.

Learning-To-Rank Recommendation Systems

Differentially Private Adaptive Optimization with Delayed Preconditioners

1 code implementation1 Dec 2022 Tian Li, Manzil Zaheer, Ken Ziyu Liu, Sashank J. Reddi, H. Brendan McMahan, Virginia Smith

Privacy noise may negate the benefits of using adaptive optimizers in differentially private model training.

Large Language Models with Controllable Working Memory

no code implementations9 Nov 2022 Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.

counterfactual World Knowledge

Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization

1 code implementation23 Oct 2022 Nishant Yadav, Nicholas Monath, Rico Angell, Manzil Zaheer, Andrew McCallum

When the similarity is measured by dot-product between dual-encoder vectors or $\ell_2$-distance, there already exist many scalable and efficient search methods.

Retrieval

Generalization Properties of Retrieval-based Models

no code implementations6 Oct 2022 Soumya Basu, Ankit Singh Rawat, Manzil Zaheer

The second class of retrieval-based approaches we explore learns a global model using kernel methods to directly map an input instance and retrieved examples to a prediction, without explicitly solving a local learning task.

Protein Folding Retrieval

A Fourier Approach to Mixture Learning

no code implementations5 Oct 2022 Mingda Qiao, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil Zaheer

Regev and Vijayaraghavan (2017) showed that with $\Delta = \Omega(\sqrt{\log k})$ separation, the means can be learned using $\mathrm{poly}(k, d)$ samples, whereas super-polynomially many samples are required if $\Delta = o(\sqrt{\log k})$ and $d = \Omega(\log k)$.

Teacher Guided Training: An Efficient Framework for Knowledge Transfer

no code implementations14 Aug 2022 Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data.

Generalization Bounds Image Classification +4

Questions Are All You Need to Train a Dense Passage Retriever

1 code implementation21 Jun 2022 Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer

We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.

Denoising Language Modelling +1

Compositional Generalization and Decomposition in Neural Program Synthesis

no code implementations7 Apr 2022 Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

We first characterize several different axes along which program synthesis methods would be desired to generalize, e. g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data.

Program Synthesis

Knowledge Base Question Answering by Case-based Reasoning over Subgraphs

1 code implementation22 Feb 2022 Rajarshi Das, Ameya Godbole, Ankita Naik, Elliot Tower, Robin Jia, Manzil Zaheer, Hannaneh Hajishirzi, Andrew McCallum

Question answering (QA) over knowledge bases (KBs) is challenging because of the diverse, essentially unbounded, types of reasoning patterns needed.

Knowledge Base Question Answering

Private Adaptive Optimization with Side Information

1 code implementation12 Feb 2022 Tian Li, Manzil Zaheer, Sashank J. Reddi, Virginia Smith

Adaptive optimization methods have become the default solvers for many machine learning tasks.

Deep Hierarchy in Bandits

no code implementations3 Feb 2022 Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

We use this exact posterior to analyze the Bayes regret of HierTS in Gaussian bandits.

Thompson Sampling

Robust Training of Neural Networks Using Scale Invariant Architectures

no code implementations2 Feb 2022 Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models.

A Context-Integrated Transformer-Based Neural Network for Auction Design

1 code implementation29 Jan 2022 Zhijian Duan, Jingwu Tang, Yutong Yin, Zhe Feng, Xiang Yan, Manzil Zaheer, Xiaotie Deng

One of the central problems in auction design is developing an incentive-compatible mechanism that maximizes the auctioneer's expected revenue.

Hierarchical Bayesian Bandits

no code implementations12 Nov 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit.

Federated Learning Thompson Sampling

When in Doubt, Summon the Titans: Efficient Inference with Large Models

no code implementations19 Oct 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

When in Doubt, Summon the Titans: A Framework for Efficient Inference with Large Models

no code implementations29 Sep 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

No Regrets for Learning the Prior in Bandits

no code implementations NeurIPS 2021 Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with.

Thompson Sampling

Thompson Sampling with a Mixture Prior

no code implementations10 Jun 2021 Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution.

Decision Making Multi-Task Learning +3

Differentiable Meta-Learning of Bandit Policies

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution P. In this work, we learn such policies for an unknown distribution P using samples from P. Our approach is a form of meta-learning and exploits properties of P without making strong assumptions about its form.

Meta-Learning

Non-Stationary Latent Bandits

no code implementations1 Dec 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Mohammad Ghavamzadeh, Craig Boutilier

The key idea is to frame this problem as a latent bandit, where the prototypical models of user behavior are learned offline and the latent state of the user is inferred online from its interactions with the models.

Recommendation Systems Thompson Sampling

Latent Programmer: Discrete Latent Codes for Program Synthesis

no code implementations1 Dec 2020 Joey Hong, David Dohan, Rishabh Singh, Charles Sutton, Manzil Zaheer

The latent codes are learned using a self-supervised learning principle, in which first a discrete autoencoder is trained on the output sequences, and then the resulting latent codes are used as intermediate targets for the end-to-end sequence prediction task.

Document Summarization Program Synthesis +1

Modifying Memories in Transformer Models

no code implementations1 Dec 2020 Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar

In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.

Memorization

PLLay: Efficient Topological Layer based on Persistent Landscapes

1 code implementation NeurIPS 2020 Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Kim, Frederic Chazal, Larry Wasserman

We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure.

Federated Composite Optimization

1 code implementation17 Nov 2020 Honglin Yuan, Manzil Zaheer, Sashank Reddi

We first show that straightforward extensions of primal algorithms such as FedAvg are not well-suited for FCO since they suffer from the "curse of primal averaging," resulting in poor convergence.

Federated Learning

Differentiable Open-Ended Commonsense Reasoning

no code implementations NAACL 2021 Bill Yuchen Lin, Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Xiang Ren, William W. Cohen

As a step towards making commonsense reasoning research more realistic, we propose to study open-ended commonsense reasoning (OpenCSR) -- the task of answering a commonsense question without any pre-defined choices -- using as a resource only a corpus of commonsense facts written in natural language.

Multiple-choice

Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes

no code implementations15 Sep 2020 Xinyuan Zhang, Ruiyi Zhang, Manzil Zaheer, Amr Ahmed

High-quality dialogue-summary paired data is expensive to produce and domain-sensitive, making abstractive dialogue summarization a challenging task.

Abstractive Dialogue Summarization dialogue summary +2

A Simple Approach to Case-Based Reasoning in Knowledge Bases

1 code implementation AKBC 2020 Rajarshi Das, Ameya Godbole, Shehzaad Dhuliawala, Manzil Zaheer, Andrew McCallum

We present a surprisingly simple yet accurate approach to reasoning in knowledge graphs (KGs) that requires \emph{no training}, and is reminiscent of case-based reasoning in classical artificial intelligence (AI).

Knowledge Graphs Meta-Learning +1

Non-Stationary Off-Policy Optimization

no code implementations15 Jun 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed

This approach is practical and analyzable, and we provide guarantees on both the quality of off-policy optimization and the regret during online deployment.

Multi-Armed Bandits

Latent Bandits Revisited

no code implementations NeurIPS 2020 Joey Hong, Branislav Kveton, Manzil Zaheer, Yin-Lam Chow, Amr Ahmed, Craig Boutilier

A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.

Recommendation Systems Thompson Sampling

Meta-Learning Bandit Policies by Gradient Ascent

no code implementations9 Jun 2020 Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters.

Meta-Learning Multi-Armed Bandits

Robust Large-Margin Learning in Hyperbolic Space

no code implementations NeurIPS 2020 Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

In this paper, we present, to our knowledge, the first theoretical guarantees for learning a classifier in hyperbolic rather than Euclidean space.

Representation Learning

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

no code implementations ICLR 2021 Paul Pu Liang, Manzil Zaheer, Yu-An Wang, Amr Ahmed

In this paper, we design a simple and efficient embedding algorithm that learns a small set of anchor embeddings and a sparse transformation matrix.

Language Modelling Movie Recommendation +2

Adaptive Federated Optimization

5 code implementations ICLR 2021 Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data.

Federated Learning

Towards Modular Algorithm Induction

no code implementations27 Feb 2020 Daniel A. Abolafia, Rishabh Singh, Manzil Zaheer, Charles Sutton

Main consists of a neural controller that interacts with a variable-length input tape and learns to compose modules together with their corresponding argument choices.

Reinforcement Learning (RL)

Differentiable Reasoning over a Virtual Knowledge Base

1 code implementation ICLR 2020 Bhuwan Dhingra, Manzil Zaheer, Vidhisha Balachandran, Graham Neubig, Ruslan Salakhutdinov, William W. Cohen

In particular, we describe a neural module, DrKIT, that traverses textual data like a KB, softly following paths of relations between mentions of entities in the corpus.

Re-Ranking

Differentiable Bandit Exploration

no code implementations NeurIPS 2020 Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$.

Meta-Learning

PLLay: Efficient Topological Layer based on Persistence Landscapes

2 code implementations NeurIPS 2020 Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Sik Kim, Frederic Chazal, Larry Wasserman

We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure.

Chains-of-Reasoning at TextGraphs 2019 Shared Task: Reasoning over Chains of Facts for Explainable Multi-hop Inference

no code implementations WS 2019 Rajarshi Das, Ameya Godbole, Manzil Zaheer, Shehzaad Dhuliawala, Andrew McCallum

This paper describes our submission to the shared task on {``}Multi-hop Inference Explanation Regeneration{''} in TextGraphs workshop at EMNLP 2019 (Jansen and Ustalov, 2019).

Anchor & Transform: Learning Sparse Representations of Discrete Objects

no code implementations25 Sep 2019 Paul Pu Liang, Manzil Zaheer, YuAn Wang, Amr Ahmed

Learning continuous representations of discrete objects such as text, users, and items lies at the heart of many applications including text and user modeling.

Language Modelling text-classification +1

Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering

no code implementations WS 2019 Ameya Godbole, Dilip Kavarthapu, Rajarshi Das, Zhiyu Gong, Abhishek Singhal, Hamed Zamani, Mo Yu, Tian Gao, Xiaoxiao Guo, Manzil Zaheer, Andrew McCallum

Multi-hop question answering (QA) requires an information retrieval (IR) system that can find \emph{multiple} supporting evidence needed to answer the question, making the retrieval process very challenging.

Information Retrieval Multi-hop Question Answering +2

Developing Creative AI to Generate Sculptural Objects

no code implementations20 Aug 2019 Songwei Ge, Austin Dill, Eunsu Kang, Chun-Liang Li, Lingyao Zhang, Manzil Zaheer, Barnabas Poczos

We explore the intersection of human and machine creativity by generating sculptural objects through machine learning.

Clustering Generating 3D Point Clouds

The Myths of Our Time: Fake News

1 code implementation5 Aug 2019 Vít Růžička, Eunsu Kang, David Gordon, Ankita Patel, Jacqui Fashimpaur, Manzil Zaheer

While the purpose of most fake news is misinformation and political propaganda, our team sees it as a new type of myth that is created by people in the age of internet identities and artificial intelligence.

BIG-bench Machine Learning Misinformation +1

Randomized Exploration in Generalized Linear Bandits

no code implementations21 Jun 2019 Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Exchangeable Generative Models with Flow Scans

1 code implementation5 Feb 2019 Christopher Bender, Kevin O'Connor, Yang Li, Juan Jose Garcia, Manzil Zaheer, Junier Oliva

In this work, we develop a new approach to generative density estimation for exchangeable, non-i. i. d.

Density Estimation

Federated Optimization in Heterogeneous Networks

19 code implementations14 Dec 2018 Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

Theoretically, we provide convergence guarantees for our framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work (systems heterogeneity).

Distributed Optimization Federated Learning

Adaptive Methods for Nonconvex Optimization

1 code implementation NeurIPS 2018 Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

In this work, we provide a new analysis of such methods applied to nonconvex stochastic optimization problems, characterizing the effect of increasing minibatch size.

Stochastic Optimization

Point Cloud GAN

1 code implementation13 Oct 2018 Chun-Liang Li, Manzil Zaheer, Yang Zhang, Barnabas Poczos, Ruslan Salakhutdinov

In this paper, we first show a straightforward extension of existing GAN algorithm is not applicable to point clouds, because the constraint required for discriminators is undefined for set data.

Object Recognition

Towards Gradient Free and Projection Free Stochastic Optimization

no code implementations8 Oct 2018 Anit Kumar Sahu, Manzil Zaheer, Soummya Kar

This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization.

Stochastic Optimization

Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text

2 code implementations EMNLP 2018 Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, William W. Cohen

In this paper we look at a more practical setting, namely QA over the combination of a KB and entity-linked text, which is appropriate when an incomplete KB is available with a large text corpus.

Graph Representation Learning Open-Domain Question Answering

Nonparametric Density Estimation under Adversarial Losses

no code implementations NeurIPS 2018 Shashank Singh, Ananya Uppal, Boyue Li, Chun-Liang Li, Manzil Zaheer, Barnabás Póczos

We study minimax convergence rates of nonparametric density estimation under a large class of loss functions called "adversarial losses", which, besides classical $\mathcal{L}^p$ losses, includes maximum mean discrepancy (MMD), Wasserstein distance, and total variation distance.

Density Estimation

Transformation Autoregressive Networks

no code implementations ICML 2018 Junier B. Oliva, Avinava Dubey, Manzil Zaheer, Barnabás Póczos, Ruslan Salakhutdinov, Eric P. Xing, Jeff Schneider

Further, through a comprehensive study over both real world and synthetic data, we show for that jointly leveraging transformations of variables and autoregressive conditional models, results in a considerable improvement in performance.

Density Estimation Outlier Detection

Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning

7 code implementations ICLR 2018 Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum

Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information.

Navigate Relation +1

A Generic Approach for Escaping Saddle points

no code implementations5 Sep 2017 Sashank J. Reddi, Manzil Zaheer, Suvrit Sra, Barnabas Poczos, Francis Bach, Ruslan Salakhutdinov, Alexander J. Smola

A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points.

Second-order methods

Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data

no code implementations ICML 2017 Manzil Zaheer, Amr Ahmed, Alexander J. Smola

Recurrent neural networks, such as long-short term memory (LSTM) networks, are powerful tools for modeling sequential data like user browsing history (Tan et al., 2016; Korpusik et al., 2016) or natural language text (Mikolov et al., 2010).

Clustering

Canopy --- Fast Sampling with Cover Trees

no code implementations ICML 2017 Manzil Zaheer, Satwik Kottur, Amr Ahmed, José Moura, Alex Smola

In this work, we propose Canopy, a sampler based on Cover Trees that is exact, has guaranteed runtime logarithmic in the number of atoms, and is provably polynomial in the inherent dimensionality of the underlying parameter space.

Spectral Methods for Nonparametric Models

no code implementations31 Mar 2017 Hsiao-Yu Fish Tung, Chao-yuan Wu, Manzil Zaheer, Alexander J. Smola

Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models.

Deep Sets

5 code implementations NeurIPS 2017 Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola

Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong.

Anomaly Detection Outlier Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.