Search Results for author: Jiji Tang

Found 6 papers, 3 papers with code

Let Storytelling Tell Vivid Stories: An Expressive and Fluent Multimodal Storyteller

no code implementations • 12 Mar 2024 • Chuanqi Zang, Jiji Tang, Rongsheng Zhang, Zeng Zhao, Tangjie Lv, Mingtao Pei, Wei Liang

Storytelling aims to generate reasonable and vivid narratives based on an ordered image stream.

Story Generation

Paper
Add Code

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks

1 code implementation • 15 Jan 2024 • Siyu Zou, Jiji Tang, Yiyi Zhou, Jing He, Chaoyi Zhao, Rongsheng Zhang, Zhipeng Hu, Xiaoshuai Sun

In particular, InstDiffEdit aims to employ the cross-modal attention ability of existing diffusion models to achieve instant mask guidance during the diffusion steps.

Paper
Code

Beyond First Impressions: Integrating Joint Multi-modal Cues for Comprehensive 3D Representation

1 code implementation • 6 Aug 2023 • Haowei Wang, Jiji Tang, Jiayi Ji, Xiaoshuai Sun, Rongsheng Zhang, Yiwei Ma, Minda Zhao, Lincheng Li, Zeng Zhao, Tangjie Lv, Rongrong Ji

Insufficient synergy neglects the idea that a robust 3D representation should align with the joint vision-language space, rather than independently aligning with each modality.

Ranked #1 on Zero-shot 3D Point Cloud Classification on ModelNet40

3D Classification 3D Part Segmentation +5

Paper
Code

Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations

2 code implementations • 6 May 2023 • Yufeng Huang, Jiji Tang, Zhuo Chen, Rongsheng Zhang, Xinfeng Zhang, WeiJie Chen, Zeng Zhao, Zhou Zhao, Tangjie Lv, Zhipeng Hu, Wen Zhang

In this paper, we present an end-to-end framework Structure-CLIP, which integrates Scene Graph Knowledge (SGK) to enhance multi-modal structured representations.

Image-text matching Text Matching

Paper
Code

Alpha at SemEval-2021 Task 6: Transformer Based Propaganda Classification

no code implementations • SEMEVAL 2021 • Zhida Feng, Jiji Tang, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Li Chen

This paper describes our system participated in Task 6 of SemEval-2021: the task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes.

Classification

Paper
Add Code

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

no code implementations • 30 Jun 2020 • Fei Yu, Jiji Tang, Weichong Yin, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Thus, ERNIE-ViL can learn the joint representations characterizing the alignments of the detailed semantics across vision and language.

Ranked #2 on Visual Question Answering (VQA) on VCR (QA-R) test

Attribute Referring Expression Comprehension +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.