no code implementations • 26 Dec 2023 • Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera
We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS).
no code implementations • 19 Jul 2023 • Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien
To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation.
no code implementations • NeurIPS 2021 • Wentian Zhao, Xinxiao wu, Jiebo Luo
To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.
no code implementations • 26 Jul 2021 • Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo
Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.
no code implementations • 2 Aug 2020 • Wentian Zhao, Seokhwan Kim, Ning Xu, Hailin Jin
This paper presents a new video question answering task on screencast tutorials.
1 code implementation • AAAI 2020 • Wentian Zhao, Xinxiao wu, Xiaoxun Zhang
Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately.
Ranked #2 on Image Captioning on FlickrStyle10K
no code implementations • 17 Nov 2019 • Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, Chenliang Xu
In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters.
no code implementations • 9 Sep 2019 • Ziyi Kou, Wentian Zhao, Guofeng Cui, Shaojie Wang
Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels.
no code implementations • 4 Jun 2019 • Jingyi Hou, Xinxiao Wu, Yayun Qi, Wentian Zhao, Jiebo Luo, Yunde Jia
Extensive experiments on the MS-COCO image captioning benchmark and the MSVD video captioning benchmark validate the superiority of our method on leveraging prior commonsense knowledge to enhance relational reasoning for visual captioning.
no code implementations • 2 Dec 2018 • Shaojie Wang, Wentian Zhao, Ziyi Kou, Chenliang Xu
Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding.
no code implementations • 2 Dec 2018 • Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu
To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity.