no code implementations • LREC 2022 • Christopher Song, David Harwath, Tuka Alhanai, James Glass
We present Speak, a toolkit that allows researchers to crowdsource speech audio recordings using Amazon Mechanical Turk (MTurk).
no code implementations • 15 Feb 2024 • Yiming Meng, Ruikun Zhou, Amartya Mukherjee, Maxwell Fitzsimmons, Christopher Song, Jun Liu
We provide a theoretical analysis of both algorithms in terms of convergence of neural approximations towards the true optimal solutions in a general setting.
no code implementations • ACL 2021 • Wei-Ning Hsu, David Harwath, Christopher Song, James Glass
In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision.