Search Results for author: Weidi Xie

Found 94 papers, 56 papers with code

Towards Building Multilingual Language Model for Medicine

1 code implementation21 Feb 2024 Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions.

Language Modelling Question Answering

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

no code implementations8 Feb 2024 Chengjian Feng, Yujie Zhong, Zequn Jie, Weidi Xie, Lin Ma

In this paper, we introduce a novel paradigm to enhance the ability of object detector, e. g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models.

Object object-detection +1

Synchformer: Efficient Synchronization from Sparse Cues

2 code implementations29 Jan 2024 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse.

Audio-Visual Synchronization

Retrieval-Augmented Egocentric Video Captioning

no code implementations1 Jan 2024 Jilan Xu, Yifei HUANG, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie

In this paper, (1) we develop EgoInstructor, a retrieval-augmented multimodal captioning model that automatically retrieves semantically relevant third-person instructional videos to enhance the video captioning of egocentric videos.

Representation Learning Retrieval +1

One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

no code implementations28 Dec 2023 Ziheng Zhao, Yao Zhang, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our main contributions are three folds: (i) on data construction, we combine multiple knowledge sources to construct a multi-modal medical knowledge tree; Then we build up a large-scale segmentation dataset for training, by collecting over 11K 3D medical image scans from 31 segmentation datasets with careful standardization on both visual scans and label space; (ii) on model training, we formulate a universal segmentation model, that can be prompted by inputting medical terminologies in text form.

Representation Learning Segmentation +1

Amodal Ground Truth and Completion in the Wild

1 code implementation28 Dec 2023 Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images.

Image Segmentation Segmentation +1

Large-scale Long-tailed Disease Diagnosis on Radiology Images

1 code implementation26 Dec 2023 Qiaoyu Zheng, Weike Zhao, Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this study, we aim to investigate the problem of large-scale, large-vocabulary disease classification for radiologic images, which can be formulated as a multi-modal, multi-anatomy, multi-label, long-tailed classification.

Anatomy

A Strong Baseline for Temporal Video-Text Alignment

no code implementations21 Dec 2023 Zeqian Li, Qirui Chen, Tengda Han, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our goal is to determine their corresponding timestamps in the video.

Descriptive Language Modelling +3

Appearance-based Refinement for Object-Centric Motion Segmentation

no code implementations18 Dec 2023 Junyu Xie, Weidi Xie, Andrew Zisserman

The goal of this paper is to discover, segment, and track independently moving objects in complex visual scenes.

Motion Segmentation Object +5

Grounded Question-Answering in Long Egocentric Videos

1 code implementation11 Dec 2023 Shangzhe Di, Weidi Xie

Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics.

Open-Ended Question Answering Video Question Answering +1

Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

1 code implementation15 Oct 2023 Chaoyi Wu, Jiayu Lei, Qiaoyu Zheng, Weike Zhao, Weixiong Lin, Xiaoman Zhang, Xiao Zhou, Ziheng Zhao, Ya zhang, Yanfeng Wang, Weidi Xie

Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public.

Anatomy Computed Tomography (CT) +2

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

no code implementations10 Oct 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

What Does Stable Diffusion Know about the 3D Scene?

1 code implementation10 Oct 2023 Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman

(iii) We find that features from Stable Diffusion are good for discriminative learning of a number of properties, including scene geometry, support relations, shadows and depth, but less performant for occlusion and material.

A Large-scale Dataset for Audio-Language Representation Learning

no code implementations20 Sep 2023 Luoyi Sun, Xuenan Xu, Mengyue Wu, Weidi Xie

To tackle these challenges, we present an innovative and automatic audio caption generation pipeline based on a series of public tools or APIs, and construct a large-scale, high-quality, audio-language dataset, named as Auto-ACD, comprising over 1. 9M audio-text pairs.

Audio captioning Representation Learning +1

UniBrain: Universal Brain MRI Diagnosis with Hierarchical Knowledge-enhanced Pre-training

1 code implementation13 Sep 2023 Jiayu Lei, Lisong Dai, Haoyun Jiang, Chaoyi Wu, Xiaoman Zhang, Yao Zhang, Jiangchao Yao, Weidi Xie, Yanyong Zhang, Yuehua Li, Ya zhang, Yanfeng Wang

Magnetic resonance imaging~(MRI) have played a crucial role in brain disease diagnosis, with which a range of computer-aided artificial intelligence methods have been proposed.

The Making and Breaking of Camouflage

no code implementations ICCV 2023 Hala Lamdouar, Weidi Xie, Andrew Zisserman

We also incorporate the proposed camouflage score into a generative model as an auxiliary loss and show that effective camouflage images or videos can be synthesised in a scalable manner.

Diagnosing Human-object Interaction Detectors

1 code implementation16 Aug 2023 Fangrui Zhu, Yiming Xie, Weidi Xie, Huaizu Jiang

To address this issue, in this paper, we introduce a diagnosis toolbox to provide detailed quantitative break-down analysis of HOI detection models, inspired by the success of object detection diagnosis toolboxes.

Classification Human-Object Interaction Detection +3

Joint-Relation Transformer for Multi-Person Motion Prediction

1 code implementation ICCV 2023 Qingyao Xu, Weibo Mao, Jingze Gong, Chenxin Xu, Siheng Chen, Weidi Xie, Ya zhang, Yanfeng Wang

Multi-person motion prediction is a challenging problem due to the dependency of motion on both individual past movements and interactions with other people.

motion prediction Relation

Boost Video Frame Interpolation via Motion Adaptation

1 code implementation24 Jun 2023 HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.

Motion Estimation Video Frame Interpolation

arXiVeri: Automatic table verification with GPT

1 code implementation13 Jun 2023 Gyungin Shin, Weidi Xie, Samuel Albanie

In this paper, we propose to meet this challenge through the novel task of automatic table verification (AutoTV), in which the objective is to verify the accuracy of numerical data in tables by cross-referencing cited sources.

Zero-shot Composed Text-Image Retrieval

1 code implementation12 Jun 2023 Yikun Liu, Jiangchao Yao, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of composed image retrieval (CIR), it aims to train a model that can fuse multi-modal information, e. g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.

Image Retrieval Retrieval +1

Multi-Modal Classifiers for Open-Vocabulary Object Detection

no code implementations8 Jun 2023 Prannay Kaul, Weidi Xie, Andrew Zisserman

The goal of this paper is open-vocabulary object detection (OVOD) $\unicode{x2013}$ building a model that can detect objects beyond the set of categories seen at training, thus enabling the user to specify categories of interest at inference without the need for model retraining.

Language Modelling Large Language Model +3

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation1 Jun 2023 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +2

Annotation-free Audio-Visual Segmentation

no code implementations18 May 2023 Jinxiang Liu, Yu Wang, Chen Ju, Chaofan Ma, Ya zhang, Weidi Xie

The objective of Audio-Visual Segmentation (AVS) is to localise the sounding objects within visual scenes by accurately predicting pixel-wise segmentation masks.

Image Segmentation Segmentation +1

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

2 code implementations17 May 2023 Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information.

Generative Visual Question Answering Language Modelling +4

Zero-shot Unsupervised Transfer Instance Segmentation

1 code implementation27 Apr 2023 Gyungin Shin, Samuel Albanie, Weidi Xie

Segmentation is a core computer vision competency, with applications spanning a broad range of scientifically and economically valuable domains.

Instance Segmentation Segmentation +1

PMC-LLaMA: Towards Building Open-source Language Models for Medicine

1 code implementation27 Apr 2023 Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

Our contributions are threefold: (i) we systematically investigate the process of adapting a general-purpose foundation language model towards medical domain, this involves data-centric knowledge injection through the integration of 4. 8M biomedical academic papers and 30K medical textbooks, as well as comprehensive fine-tuning for alignment with domain-specific instructions; (ii) we contribute a large-scale, comprehensive dataset for instruction tuning.

Language Modelling Natural Language Understanding +1

Towards Open-Vocabulary Video Instance Segmentation

1 code implementation ICCV 2023 Haochen Wang, Cilin Yan, Shuai Wang, XiaoLong Jiang, Xu Tang, Yao Hu, Weidi Xie, Efstratios Gavves

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos.

Instance Segmentation Segmentation +3

AutoAD: Movie Description in Context

1 code implementation CVPR 2023 Tengda Han, Max Bain, Arsha Nagrani, Gül Varol, Weidi Xie, Andrew Zisserman

The objective of this paper is an automatic Audio Description (AD) model that ingests movies and outputs AD in text form.

Image Captioning Text Generation

Collaboration Helps Camera Overtake LiDAR in 3D Detection

1 code implementation CVPR 2023 Yue Hu, Yifan Lu, Runsheng Xu, Weidi Xie, Siheng Chen, Yanfeng Wang

Camera-only 3D detection provides an economical solution with a simple configuration for localizing objects in 3D space compared to LiDAR-based detection systems.

Depth Estimation

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations21 Mar 2023 Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

1 code implementation27 Feb 2023 Xiaoman Zhang, Chaoyi Wu, Ya zhang, Yanfeng Wang, Weidi Xie

While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge.

Natural Language Understanding Representation Learning

OvarNet: Towards Open-vocabulary Object Attribute Recognition

1 code implementation CVPR 2023 Keyan Chen, XiaoLong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen, Weidi Xie

In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario.

 Ranked #1 on Open Vocabulary Attribute Detection on OVAD benchmark (using extra training data)

Attribute Knowledge Distillation +5

Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

1 code implementation CVPR 2023 Jilan Xu, Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie

The former aims to infer all masked entities in the caption given the group tokens, that enables the model to learn fine-grained alignment between visual groups and text entities.

Open Vocabulary Semantic Segmentation Semantic Segmentation

Open-vocabulary Object Segmentation with Diffusion Models

1 code implementation ICCV 2023 Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to extract the visual-language correspondence from a pre-trained text-to-image diffusion model, in the form of segmentation map, i. e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt.

Image Segmentation Object +3

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology

no code implementations5 Jan 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis Self-Supervised Learning

MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training for X-ray Diagnosis

no code implementations ICCV 2023 Chaoyi Wu, Xiaoman Zhang, Ya zhang, Yanfeng Wang, Weidi Xie

In this paper, we consider enhancing medical visual-language pre-training (VLP) with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.

Medical Diagnosis

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description

no code implementations ICCV 2023 Tengda Han, Max Bain, Arsha Nagrani, Gul Varol, Weidi Xie, Andrew Zisserman

Audio Description (AD) is the task of generating descriptions of visual content, at suitable time intervals, for the benefit of visually impaired audiences.

Language Modelling Text Generation

Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models

1 code implementation27 Oct 2022 Chaofan Ma, Yuhuan Yang, Yanfeng Wang, Ya zhang, Weidi Xie

When trained at a sufficient scale, self-supervised learning has exhibited a notable ability to solve a wide range of visual or language understanding tasks.

Image Segmentation Language Modelling +3

A Tri-Layer Plugin to Improve Occluded Detection

1 code implementation18 Oct 2022 Guanqi Zhan, Weidi Xie, Andrew Zisserman

To this end we make the following four contributions: (1) We propose a simple 'plugin' module for the detection head of two-stage object detectors to improve the recall of partially occluded objects.

Instance Segmentation Object +3

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors

2 code implementations13 Oct 2022 Vladimir Iashin, Weidi Xie, Esa Rahtu, Andrew Zisserman

This contrasts with the case of synchronising videos of talking heads, where audio-visual correspondence is dense in both time and space.

Audio-Visual Synchronization

Turbo Training with Token Dropout

no code implementations10 Oct 2022 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is an efficient training method for video tasks.

Action Classification Classification +1

A Simple Plugin for Transforming Images to Arbitrary Scales

no code implementations7 Oct 2022 Qinye Zhou, Ziyi Li, Weidi Xie, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Existing models on super-resolution often specialized for one scale, fundamentally limiting their use in practical scenarios.

Super-Resolution

NamedMask: Distilling Segmenters from Complementary Foundation Models

1 code implementation22 Sep 2022 Gyungin Shin, Weidi Xie, Samuel Albanie

Our method, termed NamedMask, begins by using CLIP to construct category-specific archives of images.

Data Augmentation Object +1

CounTR: Transformer-based Generalised Visual Counting

1 code implementation29 Aug 2022 Chang Liu, Yujie Zhong, Andrew Zisserman, Weidi Xie

In this paper, we consider the problem of generalised visual object counting, with the goal of developing a computational model for counting the number of objects from arbitrary semantic categories, using arbitrary number of "exemplars", i. e. zero-shot or few-shot counting.

Object Counting Self-Supervised Learning

Transforming the Interactive Segmentation for Medical Imaging

no code implementations20 Aug 2022 Wentao Liu, Chaofan Ma, Yuhuan Yang, Weidi Xie, Ya zhang

The goal of this paper is to interactively refine the automatic segmentation on challenging structures that fall behind human performance, either due to the scarcity of available annotations or the difficulty nature of the problem itself, for example, on segmenting cancer or small organs.

Interactive Segmentation Segmentation

Aerial Monocular 3D Object Detection

no code implementations8 Aug 2022 Yue Hu, Shaoheng Fang, Weidi Xie, Siheng Chen

To fill the gap, this work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space.

Autonomous Driving Monocular 3D Object Detection +2

Segmenting Moving Objects via an Object-Centric Layered Representation

1 code implementation5 Jul 2022 Junyu Xie, Weidi Xie, Andrew Zisserman

The objective of this paper is a model that is able to discover, track and segment multiple moving objects in a video.

Motion Segmentation Object +4

Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation

no code implementations26 Jun 2022 Jinxiang Liu, Chen Ju, Weidi Xie, Ya zhang

We present a simple yet effective self-supervised framework for audio-visual representation learning, to localize the sound source in videos.

Cross-Modal Retrieval Representation Learning +1

ReCo: Retrieve and Co-segment for Zero-shot Transfer

2 code implementations14 Jun 2022 Gyungin Shin, Weidi Xie, Samuel Albanie

Semantic segmentation has a broad range of applications, but its real-world impact has been significantly limited by the prohibitive annotation costs necessary to enable deployment.

Retrieval Segmentation +1

Temporal Alignment Networks for Long-term Video

1 code implementation CVPR 2022 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.

Action Recognition Action Segmentation +4

PromptDet: Towards Open-vocabulary Detection using Uncurated Images

2 code implementations30 Mar 2022 Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma

The goal of this work is to establish a scalable pipeline for expanding an object detector towards novel/unseen categories, using zero manual annotations.

Language Modelling Object

Unsupervised Salient Object Detection with Spectral Cluster Voting

1 code implementation23 Mar 2022 Gyungin Shin, Samuel Albanie, Weidi Xie

In this paper, we tackle the challenging task of unsupervised salient object detection (SOD) by leveraging spectral clustering on self-supervised features.

Clustering Object +5

Label, Verify, Correct: A Simple Few Shot Object Detection Method

1 code implementation CVPR 2022 Prannay Kaul, Weidi Xie, Andrew Zisserman

The objective of this paper is few-shot object detection (FSOD) -- the task of expanding an object detector for a new category given only a few instances for training.

Benchmarking Few-Shot Object Detection +1

Audio-Visual Synchronisation in the wild

no code implementations8 Dec 2021 Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman

Finally, we set the first benchmark for general audio-visual synchronisation with over 160 diverse classes in the new VGG-Sound Sync video dataset.

Lip Reading

Prompting Visual-Language Models for Efficient Video Understanding

1 code implementation8 Dec 2021 Chen Ju, Tengda Han, Kunhao Zheng, Ya zhang, Weidi Xie

Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for zero-shot generalisation.

Action Recognition Language Modelling +4

It's About Time: Analog Clock Reading in the Wild

no code implementations CVPR 2022 Charig Yang, Weidi Xie, Andrew Zisserman

In this paper, we present a framework for reading analog clocks in natural images or videos.

ImplicitVol: Sensorless 3D Ultrasound Reconstruction with Deep Implicit Representation

no code implementations24 Sep 2021 Pak-Hei Yeung, Linde Hesse, Moska Aliasi, Monique Haak, the INTERGROWTH-21st Consortium, Weidi Xie, Ana I. L. Namburete

The objective of this work is to achieve sensorless reconstruction of a 3D volume from a set of 2D freehand ultrasound images with deep implicit representation.

SSIM

Sli2Vol: Annotate a 3D Volume from a Single Slice with Self-Supervised Learning

1 code implementation26 May 2021 Pak-Hei Yeung, Ana I. L. Namburete, Weidi Xie

The objective of this work is to segment any arbitrary structures of interest (SOI) in 3D volumes by only annotating a single slice, (i. e. semi-automatic 3D segmentation).

Segmentation Self-Supervised Learning

Self-supervised Video Object Segmentation by Motion Grouping

no code implementations ICCV 2021 Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie

We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.

Motion Segmentation Object +6

All you need are a few pixels: semantic segmentation with PixelPick

2 code implementations13 Apr 2021 Gyungin Shin, Weidi Xie, Samuel Albanie

A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training.

Active Learning Segmentation +1

Quantum Self-Supervised Learning

2 code implementations26 Mar 2021 Ben Jaderberg, Lewis W. Anderson, Weidi Xie, Samuel Albanie, Martin Kiffner, Dieter Jaksch

The resurgence of self-supervised learning, whereby a deep learning model generates its own supervisory signal from the data, promises a scalable way to tackle the dramatically increasing size of real-world data sets without human annotation.

Self-Supervised Learning

NeRF--: Neural Radiance Fields Without Known Camera Parameters

5 code implementations14 Feb 2021 ZiRui Wang, Shangzhe Wu, Weidi Xie, Min Chen, Victor Adrian Prisacariu

Considering the problem of novel view synthesis (NVS) from only a set of 2D images, we simplify the training process of Neural Radiance Field (NeRF) on forward-facing scenes by removing the requirement of known or pre-computed camera parameters, including both intrinsics and 6DoF poses.

Novel View Synthesis

Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation

no code implementations23 Nov 2020 Hala Lamdouar, Charig Yang, Weidi Xie, Andrew Zisserman

We make the following three contributions: (i) We propose a novel architecture that consists of two essential components for breaking camouflage, namely, a differentiable registration module to align consecutive frames based on the background, which effectively emphasises the object boundary in the difference image, and a motion segmentation module with memory that discovers the moving objects, while maintaining the object permanence even when motion is absent at some point.

Motion Segmentation Object +3

Layered Neural Rendering for Retiming People in Video

1 code implementation16 Sep 2020 Erika Lu, Forrester Cole, Tali Dekel, Weidi Xie, Andrew Zisserman, David Salesin, William T. Freeman, Michael Rubinstein

We present a method for retiming people in an ordinary, natural video -- manipulating and editing the time in which different motions of individuals in the video occur.

Neural Rendering

Inducing Predictive Uncertainty Estimation for Face Recognition

no code implementations1 Sep 2020 Weidi Xie, Jeffrey Byrne, Andrew Zisserman

We describe three use cases on the public IJB-C face verification benchmark: (i) to improve 1:1 image-based verification error rates by rejecting low-quality face images; (ii) to improve quality score based fusion performance on the 1:1 set-based verification benchmark; and (iii) its use as a quality measure for selecting high quality (unblurred, good lighting, more frontal) faces from a collection, e. g. for automatic enrolment or display.

Face Recognition Face Verification

Memory-augmented Dense Predictive Coding for Video Representation Learning

1 code implementation ECCV 2020 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.

Action Classification Action Recognition +5

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

2 code implementations ECCV 2020 Andrew Brown, Weidi Xie, Vicky Kalogeiton, Andrew Zisserman

Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods.

Image Instance Retrieval Metric Learning +2

Self-supervised Video Object Segmentation

no code implementations22 Jun 2020 Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie

The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a. k. a.

Object One-shot visual object segmentation +4

VGGSound: A Large-scale Audio-Visual Dataset

2 code implementations29 Apr 2020 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.

Image Classification

MAST: A Memory-Augmented Self-supervised Tracker

2 code implementations CVPR 2020 Zihang Lai, Erika Lu, Weidi Xie

Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.

Semantic Segmentation Semi-Supervised Video Object Segmentation +2

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge

no code implementations5 Dec 2019 Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A. Reynolds, Andrew Zisserman

The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data.

Speaker Recognition

Video Representation Learning by Dense Predictive Coding

1 code implementation10 Sep 2019 Tengda Han, Weidi Xie, Andrew Zisserman

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.

Representation Learning Self-Supervised Action Recognition +2

AutoCorrect: Deep Inductive Alignment of Noisy Geometric Annotations

no code implementations14 Aug 2019 Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.

Object

Self-supervised Learning for Video Correspondence Flow

1 code implementation2 May 2019 Zihang Lai, Weidi Xie

Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, \ie more diverse videos, further demonstrating significant improvements on video segmentation.

Self-Supervised Learning Semi-Supervised Video Object Segmentation +4

Utterance-level Aggregation For Speaker Recognition In The Wild

9 code implementations26 Feb 2019 Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman

The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.

Speaker Recognition Text-Independent Speaker Verification

Class-Agnostic Counting

1 code implementation1 Nov 2018 Erika Lu, Weidi Xie, Andrew Zisserman

The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images.

Crowd Counting Few-Shot Learning +2

Comparator Networks

no code implementations ECCV 2018 Weidi Xie, Li Shen, Andrew Zisserman

Our contributions are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of sets (each may contain a variable number of images) as inputs, and compute a similarity between the pair--this involves attending to multiple discriminative local regions (landmarks), and comparing local descriptors between pairs of faces; (ii) To encourage high-quality representations for each set, internal competition is introduced for recalibration based on the landmark score; (iii) Inspired by image retrieval, a novel hard sample mining regime is proposed to control the sampling process, such that the DCN is complementary to the standard image classification models.

Face Recognition Image Classification +2

Multicolumn Networks for Face Recognition

1 code implementation24 Jul 2018 Weidi Xie, Andrew Zisserman

In this paper, we design a neural network architecture that learns to aggregate based on both "visual" quality (resolution, illumination), and "content" quality (relative importance for discriminative classification).

Ranked #5 on Face Verification on IJB-C (TAR @ FAR=1e-2 metric)

Face Recognition General Classification

Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks

no code implementations3 Nov 2017 Davis M. Vigneault, Weidi Xie, Carolyn Y. Ho, David A. Bluemke, J. Alison Noble

Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses.

Image Segmentation Segmentation +1

VGGFace2: A dataset for recognising faces across pose and age

22 code implementations23 Oct 2017 Qiong Cao, Li Shen, Weidi Xie, Omkar M. Parkhi, Andrew Zisserman

The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.

 Ranked #1 on Face Verification on IJB-C (training dataset metric)

Face Recognition Face Verification +1

Freehand Ultrasound Image Simulation with Spatially-Conditioned Generative Adversarial Networks

no code implementations17 Jul 2017 Yipeng Hu, Eli Gibson, Li-Lin Lee, Weidi Xie, Dean C. Barratt, Tom Vercauteren, J. Alison Noble

Sonography synthesis has a wide range of applications, including medical procedure simulation, clinical training and multimodality image registration.

Anatomy Image Registration +1

Feature Tracking Cardiac Magnetic Resonance via Deep Learning and Spline Optimization

no code implementations12 Apr 2017 Davis M. Vigneault, Weidi Xie, David A. Bluemke, J. Alison Noble

Feature tracking Cardiac Magnetic Resonance (CMR) has recently emerged as an area of interest for quantification of regional cardiac function from balanced, steady state free precession (SSFP) cine sequences.

Cannot find the paper you are looking for? You can Submit a new open access paper.