Search Results for author: Yatai Ji

Found 8 papers, 7 papers with code

MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering

1 code implementation • Findings (EMNLP) 2021 • Junjie Wang, Yatai Ji, Jiaqi Sun, Yujiu Yang, Tetsuya Sakai

On the other hand, trilinear models such as the CTI model efficiently utilize the inter-modality information between answers, questions, and images, while ignoring intra-modality information.

Multiple-choice Question Answering +1

Paper
Code

Taming Lookup Tables for Efficient Image Retouching

1 code implementation • 28 Mar 2024 • Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources.

Image Enhancement Image Retouching

Paper
Code

Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection

1 code implementation • 28 Nov 2023 • Yicheng Xiao, Zhuoyan Luo, Yong liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li

Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis.

Ranked #1 on Highlight Detection on YouTube Highlights

Contrastive Learning Highlight Detection +5

Paper
Code

Global and Local Semantic Completion Learning for Vision-Language Pre-training

1 code implementation • 12 Jun 2023 • Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

MGSC promotes learning more representative global features, which have a great impact on the performance of downstream tasks, while MLTC reconstructs modal-fusion local tokens, further enhancing accurate comprehension of multimodal data.

Language Modelling Masked Language Modeling +5

Paper
Code

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

1 code implementation • 23 May 2023 • Weifeng Chen, Yatai Ji, Jie Wu, Hefeng Wu, Pan Xie, Jiashi Li, Xin Xia, Xuefeng Xiao, Liang Lin

Based on a pre-trained conditional text-to-image (T2I) diffusion model, our model aims to generate videos conditioned on a sequence of control signals, such as edge or depth maps.

Optical Flow Estimation Style Transfer +4

338

Paper
Code

Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition

no code implementations • 9 Dec 2022 • Xinzhe Ni, Yong liu, Hao Wen, Yatai Ji, Jing Xiao, Yujiu Yang

Then in the visual flow, visual prototypes are computed by a Temporal-Relational CrossTransformer (TRX) module for example.

Few-Shot action recognition Few Shot Action Recognition +1

Paper
Add Code

Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

1 code implementation • CVPR 2023 • Yatai Ji, RongCheng Tu, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities.

Ranked #8 on Zero-Shot Video Retrieval on LSMDC

Language Modelling Masked Language Modeling +6

Paper
Code

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

1 code implementation • CVPR 2023 • Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang

Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets.

Contrastive Learning Image-text matching +9

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.