1 code implementation • 22 Nov 2023 • Qifan Yu, Juncheng Li, Longhui Wei, Liang Pang, Wentao Ye, Bosheng Qin, Siliang Tang, Qi Tian, Yueting Zhuang
Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks.
no code implementations • 15 Aug 2023 • Bosheng Qin, Wentao Ye, Qifan Yu, Siliang Tang, Yueting Zhuang
Our approach employs a pretrained T2I diffusion model to generate each video frame in an autoregressive fashion.
no code implementations • 21 May 2023 • Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang
To improve the consistency between adjacent frames of generated videos, we propose the Frame Difference Loss, which is incorporated during the training process.
no code implementations • 24 Nov 2022 • Bosheng Qin, Juncheng Li, Siliang Tang, Yueting Zhuang
Furthermore, we show that the hidden state dimension can be approximated by extending the Johnson-Lindenstrauss lemma, optimizing the attention in bilinear form.