1 code implementation • 14 Feb 2023 • Jianhua Yang, Kun Dai
Designing a real-time framework for the spatio-temporal action detection task is still a challenge.
1 code implementation • 20 Oct 2022 • Jianhua Yang
On the AVA, our optimized YOWO achieves 20. 6\% frame mAP with 16 frames, also exceeding the official YOWO.
1 code implementation • NeurIPS 2021 • Keji He, Yan Huang, Qi Wu, Jianhua Yang, Dong An, Shuanglin Sima, Liang Wang
In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions.
1 code implementation • 16 Jun 2021 • Jianhua Yang, Yan Huang, Zhanyu Ma, Liang Wang
To solve this problem, we propose a simple yet effective Cascaded Multi-modal Fusion (CMF) module, which stacks multiple atrous convolutional layers in parallel and further introduces a cascaded branch to fuse visual and linguistic features.
1 code implementation • IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020 • Rongbo Fan, Bochuan Hou, Jinbao Liu, Jianhua Yang, Zenglin Hong
The registration of multiresolution optical remote sensing images has been widely used in image fusion, change detection, and image stitching.
no code implementations • 2 Nov 2020 • Jianhua Yang, Yan Huang, Kai Niu, Linjiang Huang, Zhanyu Ma, Liang Wang
Previous methods fail to explicitly align the video content with the textual query in a fine-grained manner according to the actor and its action, due to the problem of \emph{semantic asymmetry}.
Ranked #9 on Referring Expression Segmentation on J-HMDB
no code implementations • 22 Aug 2019 • Guoliang Feng, Wei Lu, Witold Pedrycz, Jianhua Yang, Xiaodong Liu
Index Terms-Fuzzy cognitive maps (FCMs), maximum entropy, noisy data, rapid and robust learning.
no code implementations • 28 Apr 2018 • Xiaohuan Cao, Jianhua Yang, Li Wang, Zhong Xue, Qian Wang, Dinggang Shen
In this paper, we propose to train a non-rigid inter-modality image registration network, which can directly predict the transformation field from the input multimodal images, such as CT and MR images.
1 code implementation • 21 Apr 2018 • Jianhua Yang, Kai Liu, Xiangui Kang, Edward K. Wong, Yun-Qing Shi
The architecture contain three component modules: a generator, an embedding simulator and a discriminator.
Multimedia