Search Results for author: Layne Berry

Found 6 papers, 4 papers with code

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

no code implementations2 Nov 2022 Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath

This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.

Image Retrieval Retrieval +1

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

1 code implementation1 Nov 2022 Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald

Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning.

Data Augmentation Image Retrieval +2

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

1 code implementation3 Oct 2022 Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-Yi Lee, David Harwath

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.