Search Results for author: Ludan Ruan

Found 7 papers, 5 papers with code

UniVG: Towards UNIfied-modal Video Generation

no code implementations • 17 Jan 2024 • Ludan Ruan, Lei Tian, Chuanwei Huang, Xu Zhang, Xinyan Xiao

This cannot fully meet the needs of real-world application scenarios, as users are likely to input images and text conditions in a flexible manner, either individually or in combination.

Video Generation

Paper
Add Code

Accommodating Audio Modality in CLIP for Multimodal Processing

1 code implementation • 12 Mar 2023 • Ludan Ruan, Anwen Hu, Yuqing Song, Liang Zhang, Sipeng Zheng, Qin Jin

In this paper, we extend the stateof-the-art Vision-Language model CLIP to accommodate the audio modality for Vision-Language-Audio multimodal processing.

AudioCaps Contrastive Learning +4

Paper
Code

TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World

1 code implementation • 14 Jan 2023 • Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu

Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall.

Knowledge Graphs

Paper
Code

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

1 code implementation • CVPR 2023 • Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.

Denoising FAD +1

333

Paper
Code

Survey: Transformer based Video-Language Pre-training

no code implementations • 21 Sep 2021 • Ludan Ruan, Qin Jin

Inspired by the success of transformer-based pre-training methods on natural language tasks and further computer vision tasks, researchers have begun to apply transformer to video processing.

Position

Paper
Add Code

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

1 code implementation • 11 Jun 2021 • Ludan Ruan, Jieting Chen, Yuqing Song, ShiZhe Chen, Qin Jin

For the object grounding, we fine-tune the state-of-the-art detection model MDETR and design a post processing method to make the grounding results more faithful.

Caption Generation Object +1

154

Paper
Code

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos

1 code implementation • 12 Apr 2020 • Shizhe Chen, Weiying Wang, Ludan Ruan, Linli Yao, Qin Jin

The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.

Action Understanding Question Answering +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.