Search Results for author: Xinsong Zhang

Found 12 papers, 7 papers with code

Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks

1 code implementation • 12 Jan 2023 • Xinsong Zhang, Yan Zeng, Jipeng Zhang, Hang Li

X-FM has one language encoder, one vision encoder, and one fusion encoder, as well as a new training method.

Ranked #3 on Visual Grounding on RefCOCO+ test B

Cross-Modal Retrieval Open-Ended Question Answering +3

Paper
Code

X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

2 code implementations • 22 Nov 2022 • Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou

Vision language pre-training aims to learn alignments between vision and language from a large amount of data.

Ranked #1 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Cross-Modal Retrieval Image Captioning +7

430

Paper
Code

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning

1 code implementation • 14 Oct 2022 • Tiannan Wang, Wangchunshu Zhou, Yan Zeng, Xinsong Zhang

Pre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks.

Caption Generation Knowledge Distillation +1

Paper
Code

Write and Paint: Generative Vision-Language Models are Unified Modal Learners

1 code implementation • 15 Jun 2022 • Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang

In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.

Language Modelling Text Generation +1

Paper
Code

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

1 code implementation • 1 Jun 2022 • Yan Zeng, Wangchunshu Zhou, Ao Luo, Ziming Cheng, Xinsong Zhang

To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.

Ranked #1 on Zero-Shot Cross-Lingual Visual Question Answering on xGQA

Contrastive Learning Language Modelling +9

Paper
Code

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

1 code implementation • 30 May 2022 • Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang

We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.

Vietnamese Language Models Vietnamese Natural Language Understanding +1

Paper
Code

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

1 code implementation • 16 Nov 2021 • Yan Zeng, Xinsong Zhang, Hang Li

Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts.

Ranked #1 on Image Retrieval on Flickr30K 1K test (using extra training data)

Cross-Modal Retrieval Image Captioning +9

430

Paper
Code

Active Testing: An Unbiased Evaluation Method for Distantly Supervised Relation Extraction

no code implementations • Findings of the Association for Computational Linguistics 2020 • Pengshuai Li, Xinsong Zhang, Weijia Jia, Wei Zhao

Distant supervision has been a widely used method for neural relation extraction for its convenience of automatically labeling datasets.

Relation Relation Extraction

Paper
Add Code

AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization

no code implementations • Findings (ACL) 2021 • Xinsong Zhang, Pengshuai Li, Hang Li

In fact, both fine-grained and coarse-grained tokenizations have advantages and disadvantages for learning of pre-trained language models.

Language Modelling Natural Language Understanding

Paper
Add Code

GAN Driven Semi-distant Supervision for Relation Extraction

no code implementations • NAACL 2019 • Pengshuai Li, Xinsong Zhang, Weijia Jia, Hai Zhao

Distant supervision has been widely used in relation extraction tasks without hand-labeled datasets recently.

Generative Adversarial Network Relation +1

Paper
Add Code

Multi-labeled Relation Extraction with Attentive Capsule Network

no code implementations • 11 Nov 2018 • Xinsong Zhang, Pengshuai Li, Weijia Jia, Hai Zhao

To disclose overlapped multiple relations from a sentence still keeps challenging.

Multi-Labeled Relation Extraction Relation +1

Paper
Add Code

Neural Relation Extraction via Inner-Sentence Noise Reduction and Transfer Learning

no code implementations • EMNLP 2018 • Tianyi Liu, Xinsong Zhang, Wanhao Zhou, Weijia Jia

Extracting relations is critical for knowledge base completion and construction in which distant supervised methods are widely used to extract relational facts automatically with the existing knowledge bases.

Ranked #1 on Relationship Extraction (Distant Supervised) on New York Times Corpus (Average Precision metric)

Knowledge Base Completion Relation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.