Search Results for author: Phillip Keung

Found 13 papers, 3 papers with code

Domain Mismatch Doesn’t Always Prevent Cross-lingual Transfer Learning

no code implementations • LREC 2022 • Daniel Edmiston, Phillip Keung, Noah A. Smith

Cross-lingual transfer learning without labeled target language data or parallel text has been surprisingly effective in zero-shot cross-lingual classification, question answering, unsupervised machine translation, etc.

Bilingual Lexicon Induction Cross-Lingual Transfer +5

Paper
Add Code

The Engage Corpus: A Social Media Dataset for Text-Based Recommender Systems

no code implementations • LREC 2022 • Daniel Cheng, Kyle Yan, Phillip Keung, Noah A. Smith

Social media platforms play an increasingly important role as forums for public discourse.

Misinformation Recommendation Systems

Paper
Add Code

ACID: Abstractive, Content-Based IDs for Document Retrieval with Language Models

no code implementations • 14 Nov 2023 • Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith

Our results demonstrate the effectiveness of human-readable, natural-language IDs in generative retrieval with LMs.

Language Modelling Large Language Model +2

Paper
Add Code

NarrowBERT: Accelerating Masked Language Model Pretraining and Inference

1 code implementation • 11 Jan 2023 • Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith

We propose NarrowBERT, a modified transformer encoder that increases the throughput for masked language model pretraining by more than $2\times$.

Language Modelling NER +2

Paper
Code

Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning

no code implementations • 30 Nov 2022 • Daniel Edmiston, Phillip Keung, Noah A. Smith

Bilingual Lexicon Induction Cross-Lingual Transfer +5

Paper
Add Code

Unsupervised Bitext Mining and Translation via Self-trained Contextual Embeddings

no code implementations • 15 Oct 2020 • Phillip Keung, Julian Salazar, Yichao Lu, Noah A. Smith

We then improve an XLM-based unsupervised neural MT system pre-trained on Wikipedia by supplementing it with pseudo-parallel text mined from the same corpus, boosting unsupervised translation performance by up to 3. 5 BLEU on the WMT'14 French-English and WMT'16 German-English tasks and outperforming the previous state-of-the-art.

Machine Translation Sentence +2

Paper
Add Code

The Multilingual Amazon Reviews Corpus

1 code implementation • EMNLP 2020 • Phillip Keung, Yichao Lu, György Szarvas, Noah A. Smith

We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale collection of Amazon reviews for multilingual text classification.

General Classification Multilingual text classification +4

Paper
Code

Improving Non-autoregressive Neural Machine Translation with Monolingual Data

no code implementations • ACL 2020 • Jiawei Zhou, Phillip Keung

Non-autoregressive (NAR) neural machine translation is usually done via knowledge distillation from an autoregressive (AR) model.

Data Augmentation Knowledge Distillation +2

Paper
Add Code

Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings

no code implementations • EMNLP 2020 • Phillip Keung, Yichao Lu, Julian Salazar, Vikas Bhardwaj

Multilingual contextual embeddings have demonstrated state-of-the-art performance in zero-shot cross-lingual transfer learning, where multilingual BERT is fine-tuned on one source language and evaluated on a different target language.

Model Selection Transfer Learning +2

Paper
Add Code

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

1 code implementation • 12 Feb 2020 • Phillip Keung, Wei Niu, Yichao Lu, Julian Salazar, Vikas Bhardwaj

We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER

no code implementations • IJCNLP 2019 • Phillip Keung, Yichao Lu, Vikas Bhardwaj

We report the magnitude of the improvement on the multilingual MLDoc text classification and CoNLL 2002/2003 named entity recognition tasks.

General Classification named-entity-recognition +5

Paper
Add Code

A neural interlingua for multilingual machine translation

no code implementations • WS 2018 • Yichao Lu, Phillip Keung, Faisal Ladhak, Vikas Bhardwaj, Shaonan Zhang, Jason Sun

We incorporate an explicit neural interlingua into a multilingual encoder-decoder neural machine translation (NMT) architecture.

Machine Translation NMT +3

Paper
Add Code

A practical approach to dialogue response generation in closed domains

no code implementations • 28 Mar 2017 • Yichao Lu, Phillip Keung, Shaonan Zhang, Jason Sun, Vikas Bhardwaj

We describe a prototype dialogue response generation model for the customer service domain at Amazon.

Response Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.