Search Results for author: Che Zheng

Found 8 papers, 2 papers with code

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

no code implementations • 18 Apr 2024 • Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei LI, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu, Zhihui Xie

On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e. g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation.

GSM8K Question Answering +2

Paper
Add Code

Synthesizer: Rethinking Self-Attention for Transformer Models

no code implementations • 1 Jan 2021 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Language Modelling Machine Translation +2

Paper
Add Code

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

2 code implementations • ACL 2021 • Yikang Shen, Yi Tay, Che Zheng, Dara Bahri, Donald Metzler, Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and the constituency grammar that models the assembly of one or several corresponded words.

Constituency Parsing Language Modelling +2

32,804

Paper
Code

Surprise: Result List Truncation via Extreme Value Theory

no code implementations • 19 Oct 2020 • Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval +1

Paper
Add Code

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

no code implementations • 17 Aug 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning.

Paper
Add Code

Synthesizer: Rethinking Self-Attention in Transformer Models

1 code implementation • 2 May 2020 • Yi Tay, Dara Bahri, Donald Metzler, Da-Cheng Juan, Zhe Zhao, Che Zheng

The dot product self-attention is known to be central and indispensable to state-of-the-art Transformer models.

Ranked #1 on Dialogue Generation on Persona-Chat (BLEU-1 metric, using extra training data)

Abstractive Text Summarization Dialogue Generation +6

Paper
Code

Choppy: Cut Transformer For Ranked List Truncation

no code implementations • 26 Apr 2020 • Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins

Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user.

Information Retrieval Retrieval

Paper
Add Code

Reverse Engineering Configurations of Neural Text Generation Models

no code implementations • ACL 2020 • Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models.

Text Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.