no code implementations • 29 Nov 2023 • Xuekun Jiang, Anyi Rao, Jingbo Wang, Dahua Lin, Bo Dai
In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired.
1 code implementation • 28 Nov 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai
The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.
no code implementations • 28 Aug 2023 • Jiaju Ma, Anyi Rao, Li-Yi Wei, Rubaiat Habib Kazi, Hijung Valentina Shin, Maneesh Agrawala
Musicians and fans often produce lyric videos, a form of music videos that showcase the song's lyrics, for their favorite songs.
1 code implementation • 7 Aug 2023 • Yujie Zhou, Wenwen Qiang, Anyi Rao, Ning Lin, Bing Su, Jiaqi Wang
Specifically, 1) we maximize the MI between visual and semantic space for distribution alignment; 2) we leverage the temporal information for estimating the MI by encouraging MI to increase as more frames are observed.
4 code implementations • 10 Jul 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai
Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.
no code implementations • 5 Jun 2023 • Zikai Wei, Anyi Rao, Bo Dai, Dahua Lin
Factor model is a fundamental investment tool in quantitative investment, which can be empowered by deep learning to become more flexible and efficient in practical complicated investing situations.
1 code implementation • 27 May 2023 • Dachuan Shi, Chaofan Tao, Anyi Rao, Zhendong Yang, Chun Yuan, Jiaqi Wang
Although extensively studied for unimodal models, the acceleration for multimodal models, especially the vision-language Transformers, is relatively under-explored.
1 code implementation • 17 Feb 2023 • Yujie Zhou, Haodong Duan, Anyi Rao, Bing Su, Jiaqi Wang
Specifically, we construct a negative-sample-free triplet steam structure that is composed of an anchor stream without any masking, a spatial masking stream with Central Spatial Masking (CSM), and a temporal masking stream with Motion Attention Temporal Masking (MATM).
5 code implementations • ICCV 2023 • Lvmin Zhang, Anyi Rao, Maneesh Agrawala
ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls.
no code implementations • 30 Jan 2023 • Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, Bo Dai
Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots.
no code implementations • 17 Oct 2022 • Anyi Rao, Xuekun Jiang, Sichen Wang, Yuwei Guo, Zihao Liu, Bo Dai, Long Pang, Xiaoyu Wu, Dahua Lin, Libiao Jin
The ability to choose an appropriate camera view among multiple cameras plays a vital role in TV shows delivery.
4 code implementations • 12 Sep 2022 • Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, Ji-Rong Wen
Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality.
Ranked #7 on Molecule Captioning on ChEBI-20
1 code implementation • CVPR 2022 • Xueyi Liu, Xiaomeng Xu, Anyi Rao, Chuang Gan, Li Yi
To solve the above issues, we propose AutoGPart, a generic method enabling training generalizable 3D part segmentation networks with the task prior considered.
no code implementations • 10 Dec 2021 • Yuanbo Xiangli, Linning Xu, Xingang Pan, Nanxuan Zhao, Anyi Rao, Christian Theobalt, Bo Dai, Dahua Lin
The wide span of viewing positions within these scenes yields multi-scale renderings with very different levels of detail, which poses great challenges to neural radiance field and biases it towards compromised results.
no code implementations • ICCV 2021 • Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao, Bo Dai, Ziwei Liu, Dahua Lin
City modeling is the foundation for computational urban planning, navigation, and entertainment.
no code implementations • ECCV 2020 • Anyi Rao, Jiaze Wang, Linning Xu, Xuekun Jiang, Qingqiu Huang, Bolei Zhou, Dahua Lin
Shots are key narrative elements of various videos, e. g. movies, TV series, and user-generated videos that are thriving over the Internet.
no code implementations • ECCV 2020 • Jiangyue Xia, Anyi Rao, Qingqiu Huang, Linning Xu, Jiangtao Wen, Dahua Lin
The task of searching certain people in videos has seen increasing potential in real-world applications, such as video organization and editing.
no code implementations • ECCV 2020 • Qingqiu Huang, Yu Xiong, Anyi Rao, Jiaze Wang, Dahua Lin
We believe that such a holistic dataset would promote the researches on story-based long video understanding and beyond.
4 code implementations • CVPR 2020 • Anyi Rao, Linning Xu, Yu Xiong, Guodong Xu, Qingqiu Huang, Bolei Zhou, Dahua Lin
Scene, as the crucial unit of storytelling in movies, contains complex activities of actors and their interactions in a physical environment.
1 code implementation • 24 Mar 2018 • Anyi Rao, Francis Lau
The computer musician is able to produce musical accompaniment that relates musically to the human performance.
Sound Multimedia Audio and Speech Processing
2 code implementations • ACL 2018 • Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou
We propose an efficient method to generate white-box adversarial examples to trick a character-level neural classifier.