no code implementations • 11 Apr 2024 • Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard, Luca Marchionni, Mordehay Moradi, Tomas Pajdla, Maribel Pino, Michal Polic, Matthieu Py, Ariel Rado, Bin Ren, Elisa Ricci, Anne-Sophie Rigaud, Paolo Rota, Marta Romeo, Nicu Sebe, Weronika Sieińska, Pinchas Tandeitnik, Francesco Tonini, Nicolas Turro, Timothée Wintz, Yanchao Yu
Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary.
no code implementations • 13 Dec 2023 • Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Antonio Agudo, Francesc Moreno-Noguer
Instead of predicting body model parameters or 3D vertex coordinates, our focus is on forecasting the proposed discrete latent representation, which can be decoded into a registered human mesh.
no code implementations • 7 Dec 2023 • Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda
In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDVAE) to model the dynamics of a system composed of multiple moving sources.
1 code implementation • 7 Nov 2023 • Daniel Jost, Basavasagar Patil, Xavier Alameda-Pineda, Chris Reinke
Deep Neural Networks (DNNs) became the standard tool for function approximation with most of the introduced architectures being developed for high-dimensional input data.
1 code implementation • 18 Aug 2023 • Thomas De Min, Massimiliano Mancini, Karteek Alahari, Xavier Alameda-Pineda, Elisa Ricci
State-of-the-art rehearsal-free continual learning methods exploit the peculiarities of Vision Transformers to learn task-specific prompts, drastically reducing catastrophic forgetting.
1 code implementation • 4 Jul 2023 • Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda
Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress.
1 code implementation • CVPR 2023 • Enrico Fini, Pietro Astolfi, Karteek Alahari, Xavier Alameda-Pineda, Julien Mairal, Moin Nabi, Elisa Ricci
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.
no code implementations • 13 Jun 2023 • Xiaoyu Lin, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model.
no code implementations • 9 Jun 2023 • Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda, Renaud Séguier
We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.
no code implementations • 5 May 2023 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
The latent space is structured to dissociate the latent dynamical factors that are shared between the modalities from those that are specific to each modality.
no code implementations • 7 Mar 2023 • Xiaoyu Lin, Xiaoyu Bie, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda
The dynamical variational autoencoders (DVAEs) are a family of latent-variable deep generative models that extends the VAE to model a sequence of observed data and a corresponding sequence of latent vectors.
1 code implementation • 2 Nov 2022 • Louis Airale, Xavier Alameda-Pineda, Stéphane Lathuilière, Dominique Vaufreydaz
In this work, we address the task of unconditional head motion generation to animate still human faces in a low-dimensional semantic space from a single reference pose.
no code implementations • 2 Nov 2022 • Ali Golmakani, Mostafa Sadeghi, Xavier Alameda-Pineda, Romain Serizel
A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable.
1 code implementation • 4 Jul 2022 • Wen Guo, Yuming Du, Xi Shen, Vincent Lepetit, Xavier Alameda-Pineda, Francesc Moreno-Noguer
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences.
no code implementations • 7 Jun 2022 • Anand Ballou, Xavier Alameda-Pineda, Chris Reinke
We demonstrate the interest of the RBF layer and the usage of meta-RL for social robotics on four robotic simulation tasks.
1 code implementation • 14 Apr 2022 • Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda, Renaud Séguier
Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to identify the latent subspaces encoding $f_0$ and the first three formant frequencies, we show that these subspaces are orthogonal, and based on this orthogonality, we develop a method to accurately and independently control the source-filter speech factors within the latent subspaces.
no code implementations • 6 Apr 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda
The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model.
no code implementations • 4 Apr 2022 • Xiaoyu Bie, Wen Guo, Simon Leglaive, Lauren Girin, Francesc Moreno-Noguer, Xavier Alameda-Pineda
Studies on the automatic processing of 3D human pose data have flourished in the recent past.
1 code implementation • 26 Mar 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Moin Nabi, Xavier Alameda-Pineda, Elisa Ricci
This problem has been widely investigated in the research community and several Incremental Learning (IL) approaches have been proposed in the past years.
no code implementations • 18 Feb 2022 • Xiaoyu Lin, Laurent Girin, Xavier Alameda-Pineda
In this paper, we present an unsupervised probabilistic model and associated estimation algorithm for multi-object tracking (MOT) based on a dynamical variational autoencoder (DVAE), called DVAE-UMOT.
1 code implementation • 1 Feb 2022 • Guanglei Yang, Enrico Fini, Dan Xu, Paolo Rota, Mingli Ding, Hao Tang, Xavier Alameda-Pineda, Elisa Ricci
To fill this gap, in this paper we introduce a novel attentive feature distillation approach to mitigate catastrophic forgetting while accounting for semantic spatial- and channel-level dependencies.
no code implementations • 1 Feb 2022 • Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar
This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE).
no code implementations • CVPR 2022 • Hanyu Xuan, Zhiliang Wu, Jian Yang, Yan Yan, Xavier Alameda-Pineda
Humans can easily recognize where and how the sound is produced via watching a scene and listening to corresponding audio cues.
1 code implementation • CVPR 2022 • Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal
Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale.
no code implementations • 4 Nov 2021 • David Emukpere, Xavier Alameda-Pineda, Chris Reinke
A longstanding goal in reinforcement learning is to build intelligent agents that show fast learning and a flexible transfer of skills akin to humans and animals.
no code implementations • 29 Oct 2021 • Chris Reinke, Xavier Alameda-Pineda
Successor Representations (SR) and their extension Successor Features (SF) are prominent transfer mechanisms in domains where reward functions change between tasks.
1 code implementation • 23 Jun 2021 • Xiaoyu Bie, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin
We propose an unsupervised speech enhancement algorithm that combines a DVAE speech prior pre-trained on clean speech signals with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement.
1 code implementation • CVPR 2022 • Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer
In this paper, we explore this problem when dealing with humans performing collaborative tasks, we seek to predict the future motion of two interacted persons given two sequences of their past skeletons.
2 code implementations • 28 Mar 2021 • Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda
Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN).
Ranked #13 on Multi-Object Tracking on MOT20 (using extra training data)
no code implementations • 10 Mar 2021 • Louis Airale, Dominique Vaufreydaz, Xavier Alameda-Pineda
In this paper, we focus on a unimodal representation of interactions and propose to tackle interaction generation in a data-driven fashion.
1 code implementation • 5 Mar 2021 • Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci
Specifically, we integrate the estimation and the interaction of the attentions within a probabilistic representation learning framework, leading to Variational STructured Attention networks (VISTA-Net).
no code implementations • 8 Feb 2021 • Mostafa Sadeghi, Xavier Alameda-Pineda
Recently, audio-visual speech enhancement has been tackled in the unsupervised settings based on variational auto-encoders (VAEs), where during training only clean data is used to train a generative model for speech, which at test time is combined with a noise model, e. g. nonnegative matrix factorization (NMF), whose parameters are learned without supervision.
no code implementations • 8 Jan 2021 • Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
1 code implementation • 1 Jan 2021 • Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding, Elisa Ricci
State-of-the-art performances in dense pixel-wise prediction tasks are obtained with specifically designed convolutional networks.
no code implementations • 11 Oct 2020 • Wen Guo, Enric Corona, Francesc Moreno-Noguer, Xavier Alameda-Pineda
Our pose interacting network, or PI-Net, inputs the initial pose estimates of a variable number of interactees into a recurrent architecture used to refine the pose of the person-of-interest.
3D Multi-Person Pose Estimation (root-relative) 3D Pose Estimation
1 code implementation • 28 Aug 2020 • Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda
Recently, a series of papers have presented different extensions of the VAE to process sequential data, which model not only the latent space but also the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks or state-space models.
no code implementations • 17 Aug 2020 • Viet-Nhat Nguyen, Mostafa Sadeghi, Elisa Ricci, Xavier Alameda-Pineda
To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data.
1 code implementation • 10 Aug 2020 • Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, Bruno Lepri
Our proposed model disentangles the image content from the visual attributes, and it learns to modify the latter using the textual description, before generating a new image from the content and the modified attribute representation.
no code implementations • 2 Jun 2020 • Xavier Alameda-Pineda, Vincent Drouard, Radu Horaud
In this paper, we propose a variational approximation of piecewise linear dynamical systems.
no code implementations • 14 Apr 2020 • Mostafa Sadeghi, Xavier Alameda-Pineda, Radu Horaud
The results show that the proposed analysis is consistent with supervised metrics and that it can be used to measure the accuracy of both predicted landmarks and of automatically annotated 3DFA datasets, to detect errors and to eliminate them.
1 code implementation • 15 Mar 2020 • Yahui Liu, Marco De Nadai, Jian Yao, Nicu Sebe, Bruno Lepri, Xavier Alameda-Pineda
Unsupervised image-to-image translation (UNIT) aims at learning a mapping between several visual domains by using unpaired training images.
no code implementations • 23 Dec 2019 • Mostafa Sadeghi, Xavier Alameda-Pineda
Two encoder networks input, respectively, audio and visual data, and the posterior of the latent variables is modeled as a mixture of two Gaussian distributions output from each encoder network.
no code implementations • 10 Nov 2019 • Mostafa Sadeghi, Xavier Alameda-Pineda
When visual data is clean, speech enhancement with audio-visual VAE shows a better performance than with audio-only VAE, which is trained on audio-only data.
no code implementations • 24 Oct 2019 • Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
This paper presents a generative approach to speech enhancement based on a recurrent variational autoencoder (RVAE).
no code implementations • 7 Aug 2019 • Mostafa Sadeghi, Simon Leglaive, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
Variational auto-encoders (VAEs) are deep generative latent variable models that can be used for learning the distribution of complex data.
2 code implementations • CVPR 2020 • Yihong Xu, Aljosa Osep, Yutong Ban, Radu Horaud, Laura Leal-Taixe, Xavier Alameda-Pineda
In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers.
Ranked #4 on Multi-Object Tracking on 2D MOT 2015
no code implementations • 2 Apr 2019 • Guillaume Delorme, Yihong Xu, Stephane Lathuilière, Radu Horaud, Xavier Alameda-Pineda
Unsupervised person re-ID is the task of identifying people on a target data set for which the ID labels are unavailable during training.
no code implementations • 28 Sep 2018 • Yutong Ban, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
We propose a variational inference model which amounts to approximate the joint distribution with a factorized distribution.
no code implementations • ECCV 2018 • Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud
In this paper, we address the problem of how to robustly train a ConvNet for regression, or deep robust regression.
2 code implementations • 22 Mar 2018 • Stéphane Lathuilière, Pablo Mesejo, Xavier Alameda-Pineda, Radu Horaud
Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks.
no code implementations • 5 Mar 2018 • Dan Xu, Xavier Alameda-Pineda, Jingkuan Song, Elisa Ricci, Nicu Sebe
In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR).
no code implementations • CVPR 2018 • Wei Wang, Xavier Alameda-Pineda, Dan Xu, Pascal Fua, Elisa Ricci, Nicu Sebe
Finally, these landmark sequences are translated into face videos.
no code implementations • NeurIPS 2017 • Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.
1 code implementation • 6 Apr 2017 • Aliaksandr Siarohin, Gloria Zen, Cveta Majtanovic, Xavier Alameda-Pineda, Elisa Ricci, Nicu Sebe
In this work, we show that it is possible to automatically retrieve the best style seeds for a given image, thus remarkably reducing the number of human attempts needed to find a good match.
1 code implementation • CVPR 2017 • Xavier Alameda-Pineda, Andrea Pilzer, Dan Xu, Nicu Sebe, Elisa Ricci
In our overly-connected world, the automatic recognition of virality - the quality of an image or video to be rapidly and widely spread in social networks - is of crucial importance, and has recently awaken the interest of the computer vision community.
1 code implementation • CVPR 2016 • Xavier Alameda-Pineda, Elisa Ricci, Yan Yan, Nicu Sebe
A very popular approach for transductive multi-label recognition under linear classification settings is matrix completion.
no code implementations • CVPR 2016 • Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe
Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR).
no code implementations • 4 Sep 2015 • Israel D. Gebru, Xavier Alameda-Pineda, Florence Forbes, Radu Horaud
We propose a model selection method based on a minimum message length criterion, provide a weight initialization strategy, and validate the proposed algorithms by comparing them with several state of the art parametric and non-parametric clustering techniques.
no code implementations • 4 Sep 2015 • Sileye . Ba, Xavier Alameda-Pineda, Alessio Xompero, Radu Horaud
In this paper, we propose an on-line variational Bayesian model for multi-person tracking from cluttered visual observations provided by person detectors.
no code implementations • 23 Jun 2015 • Xavier Alameda-Pineda, Jacopo Staiano, Ramanathan Subramanian, Ligia Batrinca, Elisa Ricci, Bruno Lepri, Oswald Lanz, Nicu Sebe
Studying free-standing conversational groups (FCGs) in unstructured social settings (e. g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels.
no code implementations • 6 Nov 2013 • Xavier Alameda-Pineda, Radu Horaud
Natural human-robot interaction in complex and unpredictable environments is one of the main research lines in robotics.