no code implementations • 24 Dec 2023 • Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang
Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability.
no code implementations • 7 Dec 2023 • Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
In this study, we explore Transformer-based diffusion models for image and video generation.
no code implementations • 9 Oct 2023 • Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He
In this paper, for the first time, we introduce optical flow into the attention module in the diffusion model's U-Net to address the inconsistency issue for text-to-video editing.
1 code implementation • 30 Mar 2023 • Aiyu Cui, Sen He, Tao Xiang, Antoine Toisoul
In this work, we propose a robust warping method for virtual try-on based on a learned garment DensePose which has a direct correspondence with the person's DensePose.
no code implementations • 6 Jan 2023 • Michał Stypułkowski, Konstantinos Vougioukas, Sen He, Maciej Zięba, Stavros Petridis, Maja Pantic
Talking face generation has historically struggled to produce head movements and natural facial expressions without guidance from additional reference videos.
no code implementations • 19 Nov 2022 • Sen He, Yi-Zhe Song, Tao Xiang
Key to our model is a parallel flow estimation module that predicts the flow fields for both person and garment images conditioned on the target pose.
no code implementations • 15 Oct 2022 • Zhihe Lu, Sen He, Da Li, Yi-Zhe Song, Tao Xiang
To ensure that the fused scores are not biased to either the base or novel classes, a new Transformer-based calibration module is introduced.
Generalized Few-Shot Semantic Segmentation Semantic Segmentation
1 code implementation • 6 Apr 2022 • Xiao Han, Sen He, Li Zhang, Yi-Zhe Song, Tao Xiang
In this paper, we propose a Unified Interactive Garment Retrieval (UIGR) framework to unify TGR and VCR.
3 code implementations • CVPR 2022 • Sen He, Yi-Zhe Song, Tao Xiang
To achieve this, a key step is garment warping which spatially aligns the target garment with the corresponding body parts in the person image.
Ranked #1 on Virtual Try-on on VITON
no code implementations • 13 Dec 2021 • Tianyuan Yu, Sen He, Yi-Zhe Song, Tao Xiang
This is because they use an instance GNN as a label propagation/classification module, which is jointly meta-learned with a feature embedding network.
1 code implementation • 20 Oct 2021 • Xiao Han, Sen He, Li Zhang, Tao Xiang
Firstly, to fully utilize the existing small-scale benchmarking datasets for more discriminative feature learning, we introduce a cross-modal momentum contrastive learning framework to enrich the training data for a given mini-batch.
Ranked #10 on Text based Person Retrieval on CUHK-PEDES (using extra training data)
1 code implementation • ICCV 2021 • Zhihe Lu, Sen He, Xiatian Zhu, Li Zhang, Yi-Zhe Song, Tao Xiang
A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier (separating foreground and background pixels).
no code implementations • ICCV 2021 • Sen He, Wentong Liao, Michael Ying Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang
The generated face image given a target age code is expected to be age-sensitive reflected by bio-plausible transformations of shape and texture, while being identity preserving.
1 code implementation • CVPR 2021 • Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Yi-Zhe Song, Bodo Rosenhahn, Tao Xiang
We argue that these are caused by the lack of context-aware object and stuff feature encoding in their generators, and location-sensitive appearance representation in their discriminators.
Ranked #1 on Layout-to-Image Generation on COCO-Stuff 128x128
2 code implementations • 29 Apr 2020 • Sen He, Wentong Liao, Hamed R. -Tavakoli, Michael Yang, Bodo Rosenhahn, Nicolas Pugeault
Inspired by the successes in text analysis and translation, previous work have proposed the \textit{transformer} architecture for image captioning.
1 code implementation • CVPR 2019 • Sen He, Hamed R. -Tavakoli, Ali Borji, Yang Mi, Nicolas Pugeault
Our analyses reveal that: 1) some visual regions (e. g. head, text, symbol, vehicle) are already encoded within various layers of the network pre-trained for object recognition, 2) using modern datasets, we find that fine-tuning pre-trained models for saliency prediction makes them favor some categories (e. g. head) over some others (e. g. text), 3) although deep models of saliency outperform classical models on natural images, the converse is true for synthetic stimuli (e. g. pop-out search arrays), an evidence of significant difference between human and data-driven saliency models, and 4) we confirm that, after-fine tuning, the change in inner-representations is mostly due to the task and not the domain shift in the data.
no code implementations • ICCV 2019 • Sen He, Hamed R. -Tavakoli, Ali Borji, Nicolas Pugeault
In this work, we present a novel dataset consisting of eye movements and verbal descriptions recorded synchronously over images.
no code implementations • 15 Mar 2018 • Sen He, Nicolas Pugeault
Early saliency models were based on low-level hand-crafted feature derived from insights gained in neuroscience and psychophysics.
no code implementations • 15 Mar 2018 • Sen He, Dmitry Kangin, Yang Mi, Nicolas Pugeault
In this paper, we apply the attention mechanism to autonomous driving for steering angle prediction.
no code implementations • 15 Mar 2018 • Sen He, Ali Borji, Yang Mi, Nicolas Pugeault
Deep convolutional neural networks have demonstrated high performances for fixation prediction in recent years.
no code implementations • 12 Jan 2018 • Sen He, Nicolas Pugeault
Moreover we argue that this transformation leads to the emergence of receptive fields conceptually similar to the centre-surround filters hypothesized by early research on visual saliency.