Search Results for author: Xinsong Zhang

Found 12 papers, 7 papers with code

X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

2 code implementations22 Nov 2022 Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou

Vision language pre-training aims to learn alignments between vision and language from a large amount of data.

 Ranked #1 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Cross-Modal Retrieval Image Captioning +7

Write and Paint: Generative Vision-Language Models are Unified Modal Learners

1 code implementation15 Jun 2022 Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang

In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.

Language Modelling Text Generation +1

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

1 code implementation1 Jun 2022 Yan Zeng, Wangchunshu Zhou, Ao Luo, Ziming Cheng, Xinsong Zhang

To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.

Contrastive Learning Language Modelling +9

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

1 code implementation30 May 2022 Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang

We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.

Vietnamese Language Models Vietnamese Natural Language Understanding +1

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

1 code implementation16 Nov 2021 Yan Zeng, Xinsong Zhang, Hang Li

Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts.

 Ranked #1 on Image Retrieval on Flickr30K 1K test (using extra training data)

Cross-Modal Retrieval Image Captioning +9

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

no code implementations Findings (ACL) 2021 Xinsong Zhang, Pengshuai Li, Hang Li

In fact, both fine-grained and coarse-grained tokenizations have advantages and disadvantages for learning of pre-trained language models.

Language Modelling Natural Language Understanding

Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning

no code implementations EMNLP 2018 Tianyi Liu, Xinsong Zhang, Wanhao Zhou, Weijia Jia

Extracting relations is critical for knowledge base completion and construction in which distant supervised methods are widely used to extract relational facts automatically with the existing knowledge bases.

Knowledge Base Completion Relation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.