Search Results for author: Salim Roukos

Found 40 papers, 13 papers with code

Zero-shot Entity Linking with Less Data

2 code implementations • Findings (NAACL) 2022 • G P Shrivatsa Bhargav, Dinesh Khandelwal, Saswati Dana, Dinesh Garg, Pavan Kapanipathi, Salim Roukos, Alexander Gray, L Venkata Subramaniam

Interestingly, we discovered that BLINK exhibits diminishing returns, i. e., it reaches 98% of its performance with just 1% of the training data and the remaining 99% of the data yields only a marginal increase of 2% in the performance.

Entity Linking Multi-Task Learning +2

1,126

Paper
Code

CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems

1 code implementation • 2 Apr 2024 • Sara Rosenthal, Avirup Sil, Radu Florian, Salim Roukos

We present ClapNQ, a benchmark Long-form Question Answering dataset for the full RAG pipeline.

Long Form Question Answering Natural Questions +1

Paper
Code

Self-Refinement of Language Models from External Proxy Metrics Feedback

no code implementations • 27 Feb 2024 • Keshav Ramji, Young-suk Lee, Ramón Fernandez Astudillo, Md Arafat Sultan, Tahira Naseem, Asim Munawar, Radu Florian, Salim Roukos

It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response.

Question Answering Response Generation

Paper
Add Code

Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

1 code implementation • 21 Oct 2023 • Young-suk Lee, Md Arafat Sultan, Yousef El-Kurdi, Tahira Naseem Asim Munawar, Radu Florian, Salim Roukos, Ramón Fernandez Astudillo

Using in-context learning (ICL) for data generation, techniques such as Self-Instruct (Wang et al., 2023) or the follow-up Alpaca (Taori et al., 2023) can train strong conversational agents with only a small amount of human supervision.

In-Context Learning

Paper
Code

Formally Specifying the High-Level Behavior of LLM-Based Agents

no code implementations • 12 Oct 2023 • Maxwell Crouse, Ibrahim Abdelaziz, Ramon Astudillo, Kinjal Basu, Soham Dan, Sadhana Kumaravel, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Luis Lastras

We demonstrate how the proposed framework can be used to implement recent LLM-based agents (e. g., ReACT), and show how the flexibility of our approach can be leveraged to define a new agent with more complex behavior, the Plan-Act-Summarize-Solve (PASS) agent.

Question Answering

Paper
Add Code

MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types

no code implementations • 18 Jun 2023 • Keerthiram Murugesan, Sarathkrishna Swaminathan, Soham Dan, Subhajit Chaudhury, Chulaka Gunasekara, Maxwell Crouse, Diwakar Mahajan, Ibrahim Abdelaziz, Achille Fokoue, Pavan Kapanipathi, Salim Roukos, Alexander Gray

In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts.

Sentence

Paper
Add Code

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency

no code implementations • 31 May 2023 • Maxwell Crouse, Ramon Astudillo, Tahira Naseem, Subhajit Chaudhury, Pavan Kapanipathi, Salim Roukos, Alexander Gray

We introduce Logical Offline Cycle Consistency Optimization (LOCCO), a scalable, semi-supervised method for training a neural semantic parser.

Self-Learning Text Generation +2

Paper
Add Code

Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing

no code implementations • 26 May 2023 • Sadhana Kumaravel, Tahira Naseem, Ramon Fernandez Astudillo, Radu Florian, Salim Roukos

We evaluate our oracle and parser using the Abstract Meaning Representation (AMR) parsing 3. 0 corpus.

AMR Parsing Language Modelling +1

Paper
Add Code

AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

no code implementations • 24 Apr 2023 • Young-suk Lee, Ramón Fernandez Astudillo, Radu Florian, Tahira Naseem, Salim Roukos

Instruction fine-tuned language models on a collection of instruction annotated datasets (FLAN) have shown highly effective to improve model performance and generalization to unseen tasks.

AMR Parsing Semantic Role Labeling

Paper
Add Code

UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers

1 code implementation • 1 Mar 2023 • Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts

Many information retrieval tasks require large labeled datasets for fine-tuning.

Information Retrieval Retrieval +1

697

Paper
Code

PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development

1 code implementation • 23 Jan 2023 • Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis, Mihaela Bornea, Sara Rosenthal, Scott McCarley, Rong Zhang, Vishwajeet Kumar, Yulong Li, Md Arafat Sultan, Riyaz Bhat, Radu Florian, Salim Roukos

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers.

Question Answering Reading Comprehension +1

697

Paper
Code

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

no code implementations • 2 Dec 2022 • Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks.

Benchmarking Information Retrieval +1

Paper
Add Code

A Closer Look at the Calibration of Differentially Private Learners

no code implementations • 15 Oct 2022 • HANLIN ZHANG, Xuechen Li, Prithviraj Sen, Salim Roukos, Tatsunori Hashimoto

Across 7 tasks, temperature scaling and Platt scaling with DP-SGD result in an average 3. 1-fold reduction in the in-domain expected calibration error and only incur at most a minor percent drop in accuracy.

Paper
Add Code

Synthetic Target Domain Supervision for Open Retrieval QA

no code implementations • 20 Apr 2022 • Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avirup Sil, Vittorio Castelli, Radu Florian, Salim Roukos

Neural passage retrieval is a new and promising approach in open retrieval question answering.

Passage Retrieval Question Answering +1

Paper
Add Code

A Benchmark for Generalizable and Interpretable Temporal Question Answering over Knowledge Bases

no code implementations • 15 Jan 2022 • Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-suk Lee, Santosh Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G P Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander Gray, Guilherme Lima, Ryan Riegel, Francois Luus, L Venkata Subramaniam

Specifically, our benchmark is a temporal question answering dataset with the following advantages: (a) it is based on Wikidata, which is the most frequently curated, openly available knowledge base, (b) it includes intermediate sparql queries to facilitate the evaluation of semantic parsing based approaches for KBQA, and (c) it generalizes to multiple knowledge bases: Freebase and Wikidata.

Knowledge Base Question Answering Semantic Parsing

Paper
Add Code

DocAMR: Multi-Sentence AMR Representation and Evaluation

1 code implementation • NAACL 2022 • Tahira Naseem, Austin Blodgett, Sadhana Kumaravel, Tim O'Gorman, Young-suk Lee, Jeffrey Flanigan, Ramón Fernandez Astudillo, Radu Florian, Salim Roukos, Nathan Schneider

Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation.

coreference-resolution Sentence

Paper
Code

Learning to Transpile AMR into SPARQL

no code implementations • 15 Dec 2021 • Mihaela Bornea, Ramon Fernandez Astudillo, Tahira Naseem, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Pavan Kapanipathi, Radu Florian, Salim Roukos

We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA).

Knowledge Base Question Answering Semantic Parsing

Paper
Add Code

Maximum Bayes Smatch Ensemble Distillation for AMR Parsing

2 code implementations • NAACL 2022 • Young-suk Lee, Ramon Fernandez Astudillo, Thanh Lam Hoang, Tahira Naseem, Radu Florian, Salim Roukos

AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning.

Ranked #1 on AMR Parsing on LDC2020T02 (using extra training data)

AMR Parsing Data Augmentation +3

229

Paper
Code

Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing

1 code implementation • EMNLP 2021 • Jiawei Zhou, Tahira Naseem, Ramón Fernandez Astudillo, Young-suk Lee, Radu Florian, Salim Roukos

We provide a detailed comparison with recent progress in AMR parsing and show that the proposed parser retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new parsing state of the art for AMR 2. 0, without the need for graph re-categorization.

Ranked #9 on AMR Parsing on LDC2017T10 (using extra training data)

AMR Parsing Sentence

229

Paper
Code

SYGMA: System for Generalizable Modular Question Answering OverKnowledge Bases

no code implementations • 28 Sep 2021 • Sumit Neelam, Udit Sharma, Hima Karanam, Shajith Ikbal, Pavan Kapanipathi, Ibrahim Abdelaziz, Nandana Mihindukulasooriya, Young-suk Lee, Santosh Srivastava, Cezar Pendus, Saswati Dana, Dinesh Garg, Achille Fokoue, G P Shrivatsa Bhargav, Dinesh Khandelwal, Srinivas Ravishankar, Sairam Gurajada, Maria Chang, Rosario Uceda-Sosa, Salim Roukos, Alexander Gray, Guilherme LimaRyan Riegel, Francois Luus, L Venkata Subramaniam

In addition, to demonstrate extensi-bility to additional reasoning types we evaluate on multi-hopreasoning datasets and a new Temporal KBQA benchmarkdataset on Wikidata, namedTempQA-WD1, introduced in thispaper.

Knowledge Base Question Answering

Paper
Add Code

Combining Rules and Embeddings via Neuro-Symbolic AI for Knowledge Base Completion

no code implementations • 16 Sep 2021 • Prithviraj Sen, Breno W. S. R. Carvalho, Ibrahim Abdelaziz, Pavan Kapanipathi, Francois Luus, Salim Roukos, Alexander Gray

Recent interest in Knowledge Base Completion (KBC) has led to a plethora of approaches based on reinforcement learning, inductive logic programming and graph embeddings.

Inductive logic programming Knowledge Base Completion +1

Paper
Add Code

A Semantics-aware Transformer Model of Relation Linking for Knowledge Base Question Answering

no code implementations • ACL 2021 • Tahira Naseem, Srinivas Ravishankar, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Young-suk Lee, Pavan Kapanipathi, Salim Roukos, Alfio Gliozzo, Alexander Gray

Relation linking is a crucial component of Knowledge Base Question Answering systems.

Knowledge Base Question Answering Relation +2

Paper
Add Code

Bootstrapping Multilingual AMR with Contextual Word Alignments

no code implementations • EACL 2021 • Janaki Sheth, Young-suk Lee, Ramon Fernandez Astudillo, Tahira Naseem, Radu Florian, Salim Roukos, Todd Ward

We develop high performance multilingualAbstract Meaning Representation (AMR) sys-tems by projecting English AMR annotationsto other languages with weak supervision.

Multilingual Word Embeddings Word Alignment +1

Paper
Add Code

Leveraging Abstract Meaning Representation for Knowledge Base Question Answering

1 code implementation • Findings (ACL) 2021 • Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Salim Roukos, Alexander Gray, Ramon Astudillo, Maria Chang, Cristina Cornelio, Saswati Dana, Achille Fokoue, Dinesh Garg, Alfio Gliozzo, Sairam Gurajada, Hima Karanam, Naweed Khan, Dinesh Khandelwal, Young-suk Lee, Yunyao Li, Francois Luus, Ndivhuwo Makondo, Nandana Mihindukulasooriya, Tahira Naseem, Sumit Neelam, Lucian Popa, Revanth Reddy, Ryan Riegel, Gaetano Rossiello, Udit Sharma, G P Shrivatsa Bhargav, Mo Yu

Knowledge base question answering (KBQA)is an important task in Natural Language Processing.

Entity Linking Knowledge Base Question Answering +1

229

Paper
Code

End-to-End QA on COVID-19: Domain Adaptation with Synthetic Training

no code implementations • 2 Dec 2020 • Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avi Sil, Vittorio Castelli, Radu Florian, Salim Roukos

End-to-end question answering (QA) requires both information retrieval (IR) over a large document collection and machine reading comprehension (MRC) on the retrieved passages.

Domain Adaptation Information Retrieval +3

Paper
Add Code

A Multilingual Reading Comprehension System for more than 100 Languages

no code implementations • COLING 2020 • Anthony Ferritto, Sara Rosenthal, Mihaela Bornea, Kazi Hasan, Rishav Chakravarti, Salim Roukos, Radu Florian, Avi Sil

We also show how M-GAAMA can be used in downstream tasks by incorporating it into an END-TO-END-QA system using CFO (Chakravarti et al., 2019).

Machine Reading Comprehension Machine Translation +3

Paper
Add Code

Towards building a Robust Industry-scale Question Answering System

no code implementations • COLING 2020 • Rishav Chakravarti, Anthony Ferritto, Bhavani Iyer, Lin Pan, Radu Florian, Salim Roukos, Avi Sil

Building on top of the powerful BERTQA model, GAAMA provides a ∼2. 0{\%} absolute boost in F1 over the industry-scale state-of-the-art (SOTA) system on NQ.

Data Augmentation Natural Questions +2

Paper
Add Code

Pushing the Limits of AMR Parsing with Self-Learning

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Young-suk Lee, Ramon Fernandez Astudillo, Tahira Naseem, Revanth Gangi Reddy, Radu Florian, Salim Roukos

Abstract Meaning Representation (AMR) parsing has experienced a notable growth in performance in the last two years, due both to the impact of transfer learning and the development of novel architectures specific to AMR.

Ranked #2 on AMR Parsing on LDC2014T12

AMR Parsing Machine Translation +4

229

Paper
Code

Multi-Stage Pre-training for Low-Resource Domain Adaptation

no code implementations • EMNLP 2020 • Rong Zhang, Revanth Gangi Reddy, Md Arafat Sultan, Vittorio Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, Salim Roukos, Avirup Sil, Todd Ward

Transfer learning techniques are particularly useful in NLP tasks where a sizable amount of high-quality annotated data is difficult to obtain.

Document Ranking Domain Adaptation +3

Paper
Add Code

ARES: A Reading Comprehension Ensembling Service

no code implementations • EMNLP 2020 • Anthony Ferritto, Lin Pan, Rishav Chakravarti, Salim Roukos, Radu Florian, J. William Murdock, Avi Sil

We introduce ARES (A Reading Comprehension Ensembling Service): a novel Machine Reading Comprehension (MRC) demonstration system which utilizes an ensemble of models to increase F1 by 2. 3 points.

Machine Reading Comprehension Natural Questions +1

Paper
Add Code

Leveraging Semantic Parsing for Relation Linking over Knowledge Bases

1 code implementation • 16 Sep 2020 • Nandana Mihindukulasooriya, Gaetano Rossiello, Pavan Kapanipathi, Ibrahim Abdelaziz, Srinivas Ravishankar, Mo Yu, Alfio Gliozzo, Salim Roukos, Alexander Gray

Knowledgebase question answering systems are heavily dependent on relation extraction and linking modules.

Ranked #1 on Relation Linking on QALD-7

Question Answering Relation +3

Paper
Code

GPT-too: A language-model-first approach for AMR-to-text generation

1 code implementation • ACL 2020 • Manuel Mager, Ramon Fernandez Astudillo, Tahira Naseem, Md. Arafat Sultan, Young-suk Lee, Radu Florian, Salim Roukos

Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs.

Ranked #10 on AMR-to-Text Generation on LDC2017T10

AMR-to-Text Generation Data-to-Text Generation +3

Paper
Code

The TechQA Dataset

2 code implementations • ACL 2020 • Vittorio Castelli, Rishav Chakravarti, Saswati Dana, Anthony Ferritto, Radu Florian, Martin Franz, Dinesh Garg, Dinesh Khandelwal, Scott McCarley, Mike McCawley, Mohamed Nasr, Lin Pan, Cezar Pendus, John Pitrelli, Saurabh Pujar, Salim Roukos, Andrzej Sakrajda, Avirup Sil, Rosario Uceda-Sosa, Todd Ward, Rong Zhang

We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain.

Domain Adaptation Question Answering

Paper
Code

Path-Based Contextualization of Knowledge Graphs for Textual Entailment

no code implementations • 5 Nov 2019 • Kshitij Fadnis, Kartik Talamadupula, Pavan Kapanipathi, Haque Ishfaq, Salim Roukos, Achille Fokoue

In this paper, we introduce the problem of knowledge graph contextualization -- that is, given a specific NLP task, the problem of extracting meaningful and relevant sub-graphs from a given knowledge graph.

Knowledge Graphs Natural Language Inference

Paper
Add Code

Ensembling Strategies for Answering Natural Questions

no code implementations • 30 Oct 2019 • Anthony Ferritto, Lin Pan, Rishav Chakravarti, Salim Roukos, Radu Florian, J. William Murdock, Avirup Sil

Many of the top question answering systems today utilize ensembling to improve their performance on tasks such as the Stanford Question Answering Dataset (SQuAD) and Natural Questions (NQ) challenges.

Natural Questions Question Answering

Paper
Add Code

Frustratingly Easy Natural Question Answering

no code implementations • 11 Sep 2019 • Lin Pan, Rishav Chakravarti, Anthony Ferritto, Michael Glass, Alfio Gliozzo, Salim Roukos, Radu Florian, Avirup Sil

Existing literature on Question Answering (QA) mostly focuses on algorithmic novelty, data augmentation, or increasingly large pre-trained language models like XLNet and RoBERTa.

Ranked #5 on Question Answering on Natural Questions (long)

Data Augmentation Natural Questions +2

Paper
Add Code

CFO: A Framework for Building Production NLP Systems

no code implementations • IJCNLP 2019 • Rishav Chakravarti, Cezar Pendus, Andrzej Sakrajda, Anthony Ferritto, Lin Pan, Michael Glass, Vittorio Castelli, J. William Murdock, Radu Florian, Salim Roukos, Avirup Sil

This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments.

Information Retrieval Machine Reading Comprehension +2

Paper
Add Code

Rewarding Smatch: Transition-Based AMR Parsing with Reinforcement Learning

no code implementations • ACL 2019 • Tahira Naseem, Abhishek Shah, Hui Wan, Radu Florian, Salim Roukos, Miguel Ballesteros

Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs.

Ranked #24 on AMR Parsing on LDC2017T10

AMR Parsing reinforcement-learning +1

Paper
Add Code

Invited Talk: IBM Cognitive Computing - An NLP Renaissance!

no code implementations • EMNLP 2014 • Salim Roukos

Machine Translation Question Answering +2

Paper
Add Code

Adaptive HTER Estimation for Document-Specific MT Post-Editing

no code implementations • ACL 2014 • Fei Huang, Jian-Ming Xu, Abraham Ittycheriah, Salim Roukos

Machine Translation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.