Search Results for author: Andrew Brown

Found 13 papers, 4 papers with code

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

no code implementations17 Nov 2023 Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra

We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image.

Text-to-Video Generation Video Generation

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

1 code implementation20 Feb 2023 Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.

Speaker Diarization Speaker Recognition +1

In search of strong embedding extractors for speaker diarisation

no code implementations26 Oct 2022 Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung

First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.

Data Augmentation Speaker Verification

End-to-End Visual Editing with a Generatively Pre-Trained Artist

no code implementations3 May 2022 Andrew Brown, Cheng-Yang Fu, Omkar Parkhi, Tamara L. Berg, Andrea Vedaldi

We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change.

Face, Body, Voice: Video Person-Clustering with Multiple Modalities

no code implementations20 May 2021 Andrew Brown, Vicky Kalogeiton, Andrew Zisserman

In this paper we make contributions to address both these deficiencies: first, we introduce a Multi-Modal High-Precision Clustering algorithm for person-clustering in videos using cues from several modalities (face, body, and voice).

Clustering Face Clustering

Automated Video Labelling: Identifying Faces by Corroborative Evidence

no code implementations10 Feb 2021 Andrew Brown, Ernesto Coto, Andrew Zisserman

We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio).

Domain Adaptation Image Retrieval

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

2 code implementations ECCV 2020 Andrew Brown, Weidi Xie, Vicky Kalogeiton, Andrew Zisserman

Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods.

Image Instance Retrieval Metric Learning +2

4-Connected Shift Residual Networks

1 code implementation22 Oct 2019 Andrew Brown, Pascal Mettes, Marcel Worring

Interestingly, when incorporating shifts to all point-wise convolutions in residual networks, 4-connected shifts outperform 8-connected shifts.

Cannot find the paper you are looking for? You can Submit a new open access paper.