Search Results for author: Wentian Zhao

Found 11 papers, 1 papers with code

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision

no code implementations • 26 Dec 2023 • Lu Ling, Yichen Sheng, Zhi Tu, Wentian Zhao, Cheng Xin, Kun Wan, Lantao Yu, Qianyu Guo, Zixun Yu, Yawen Lu, Xuanmao Li, Xingpeng Sun, Rohan Ashok, Aniruddha Mukherjee, Hao Kang, Xiangrui Kong, Gang Hua, Tianyi Zhang, Bedrich Benes, Aniket Bera

We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS).

Novel View Synthesis Representation Learning

Paper
Add Code

Text2Layer: Layered Image Generation using Latent Diffusion Model

no code implementations • 19 Jul 2023 • Xinyang Zhang, Wentian Zhao, Xin Lu, Jeff Chien

To achieve layered image generation, we train an autoencoder that is able to reconstruct layered images and train diffusion models on the latent representation.

Image Generation Image Segmentation +1

Paper
Add Code

Multi-modal Dependency Tree for Video Captioning

no code implementations • NeurIPS 2021 • Wentian Zhao, Xinxiao wu, Jiebo Luo

To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.

Caption Generation Dependency Parsing +3

Paper
Add Code

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

no code implementations • 26 Jul 2021 • Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.

Graph Attention Image Captioning +1

Paper
Add Code

Video Question Answering on Screencast Tutorials

no code implementations • 2 Aug 2020 • Wentian Zhao, Seokhwan Kim, Ning Xu, Hailin Jin

This paper presents a new video question answering task on screencast tutorials.

Question Answering Video Question Answering

Paper
Add Code

MemCap: Memorizing Style Knowledge for Image Captioning

1 code implementation • AAAI 2020 • Wentian Zhao, Xinxiao wu, Xiaoxun Zhang

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately.

Ranked #2 on Image Captioning on FlickrStyle10K

Image Captioning Language Modelling +1

Paper
Code

Improve CAM with Auto-adapted Segmentation and Co-supervised Augmentation

no code implementations • 17 Nov 2019 • Ziyi Kou, Guofeng Cui, Shaojie Wang, Wentian Zhao, Chenliang Xu

In this paper, we propose a confidence segmentation (ConfSeg) module that builds confidence score for each pixel in CAM without introducing additional hyper-parameters.

Object Weakly-Supervised Object Localization

Paper
Add Code

Weakly Supervised Localization Using Background Images

no code implementations • 9 Sep 2019 • Ziyi Kou, Wentian Zhao, Guofeng Cui, Shaojie Wang

Weakly Supervised Object Localization (WSOL) methodsusually rely on fully convolutional networks in order to ob-tain class activation maps(CAMs) of targeted labels.

Object Weakly-Supervised Object Localization

Paper
Add Code

Relational Reasoning using Prior Knowledge for Visual Captioning

no code implementations • 4 Jun 2019 • Jingyi Hou, Xinxiao Wu, Yayun Qi, Wentian Zhao, Jiebo Luo, Yunde Jia

Extensive experiments on the MS-COCO image captioning benchmark and the MSVD video captioning benchmark validate the superiority of our method on leveraging prior commonsense knowledge to enhance relational reasoning for visual captioning.

Image Captioning object-detection +4

Paper
Add Code

How to Make a BLT Sandwich? Learning to Reason towards Understanding Web Instructional Videos

no code implementations • 2 Dec 2018 • Shaojie Wang, Wentian Zhao, Ziyi Kou, Chenliang Xu

Furthermore, we study multiple modalities including description and transcripts for the purpose of boosting video understanding.

Logical Reasoning Question Answering +1

Paper
Add Code

GAN-EM: GAN based EM learning framework

no code implementations • 2 Dec 2018 • Wentian Zhao, Shaojie Wang, Zhihuai Xie, Jing Shi, Chenliang Xu

To overcome such limitation, we propose a GAN based EM learning framework that can maximize the likelihood of images and estimate the latent variables with only the constraint of L-Lipschitz continuity.

Clustering Dimensionality Reduction +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.