no code implementations • 18 Dec 2023 • Amirkeivan Mohtashami, Florian Hartmann, Sian Gooding, Lukas Zilka, Matt Sharifi, Blaise Aguera y Arcas
We present and evaluate two approaches for knowledge transfer between LLMs.
no code implementations • 22 Jun 2023 • Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor, Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian Frank
AudioPaLM inherits the capability to preserve paralinguistic information such as speaker identity and intonation from AudioLM and the linguistic knowledge present only in text large language models such as PaLM-2.
1 code implementation • 16 May 2023 • Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
3 code implementations • 26 Jan 2023 • Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank
We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff".
Ranked #8 on Text-to-Music Generation on MusicCaps
5 code implementations • 7 Sep 2022 • Zalán Borsos, Raphaël Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil Zeghidour
We introduce AudioLM, a framework for high-quality audio generation with long-term consistency.
no code implementations • 15 Feb 2022 • Zalán Borsos, Matt Sharifi, Marco Tagliasacchi
We propose SpeechPainter, a model for filling in gaps of up to one second in speech samples by leveraging an auxiliary textual input.
1 code implementation • CoNLL (EMNLP) 2021 • Sian Gooding, Yevgeni Berzak, Tony Mak, Matt Sharifi
Judging the readability of text has many important applications, for instance when performing text simplification or when sourcing reading material for language learners.
no code implementations • 25 Oct 2019 • Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi, Mihajlo Velimirović
We propose a model to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation.