Search Results for author: Bei Liu

Found 35 papers, 15 papers with code

Debiasing Event Understanding for Visual Commonsense Tasks

no code implementations • Findings (ACL) 2022 • Minji Seo, YeonJoon Jung, Seungtaek Choi, Seung-won Hwang, Bei Liu

We study event understanding as a critical step towards visual commonsense tasks. Meanwhile, we argue that current object-based event understanding is purely likelihood-based, leading to incorrect event prediction, due to biased correlation between events and objects. We propose to mitigate such biases with do-calculus, proposed in causality research, but overcoming its limited robustness, by an optimized aggregation with association-based prediction. We show the effectiveness of our approach, intrinsically by comparing our generated events with ground-truth event annotation, and extrinsically by downstream commonsense tasks.

Paper
Add Code

One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

1 code implementation • 14 Oct 2023 • Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

Various Large Language Models~(LLMs) from the Generative Pretrained Transformer(GPT) family have achieved outstanding performances in a wide range of text generation tasks.

Quantization Text Generation

Paper
Code

ViCo: Engaging Video Comment Generation with Human Preference Rewards

no code implementations • 22 Aug 2023 • Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.

Caption Generation Comment Generation +1

Paper
Add Code

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

no code implementations • ICCV 2023 • Seogkyu Jeon, Bei Liu, Pilhyeon Lee, Kibeom Hong, Jianlong Fu, Hyeran Byun

Due to the data absence, the textual description of the target domain and the vision-language models, e. g., CLIP, are utilized to effectively guide the generator.

Paper
Add Code

Revisiting Latent Space of GAN Inversion for Real Image Editing

no code implementations • 18 Jul 2023 • Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem.

Paper
Add Code

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

no code implementations • ICCV 2023 • Yi-Syuan Chen, Yun-Zhu Song, Cheng Yu Yeo, Bei Liu, Jianlong Fu, Hong-Han Shuai

To this end, we raise a question: ``How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?".

Hallucination In-Context Learning

Paper
Add Code

Transferring Foundation Models for Generalizable Robotic Manipulation

no code implementations • 9 Jun 2023 • Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, LiMin Wang

In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks.

Imitation Learning Object +1

Paper
Add Code

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

no code implementations • 31 May 2023 • Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

Paper
Add Code

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

no code implementations • 30 May 2023 • Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, LiMin Wang, Jianlong Fu

We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks.

Robot Manipulation

Paper
Add Code

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

no code implementations • 18 May 2023 • Hang Shao, Wei Wang, Bei Liu, Xun Gong, Haoyu Wang, Yanmin Qian

Due to the rapid development of computing hardware resources and the dramatic growth of data, pre-trained models in speech recognition, such as Whisper, have significantly improved the performance of speech recognition tasks.

Knowledge Distillation Quantization +2

Paper
Add Code

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

1 code implementation • CVPR 2023 • Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.

Denoising FAD +1

333

Paper
Code

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

1 code implementation • 12 Oct 2022 • Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.

Ranked #2 on Video Retrieval on QuerYD (using extra training data)

Contrastive Learning Question Answering +3

437

Paper
Code

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

1 code implementation • 14 Sep 2022 • Hongwei Xue, Yuchong Sun, Bei Liu, Jianlong Fu, Ruihua Song, Houqiang Li, Jiebo Luo

and 2) how to mitigate the impact of these factors?

Ranked #2 on Video Retrieval on MSR-VTT-1kA (using extra training data)

Retrieval Text Retrieval +1

437

Paper
Code

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

1 code implementation • 7 Sep 2022 • Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu

To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN.

Image Generation

Paper
Code

Language-Guided Face Animation by Recurrent StyleGAN-based Generator

1 code implementation • 11 Aug 2022 • Tiankai Hang, Huan Yang, Bei Liu, Jianlong Fu, Xin Geng, Baining Guo

Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.

Image Manipulation

Paper
Code

Exploring Anchor-based Detection for Ego4D Natural Language Query

no code implementations • 10 Aug 2022 • Sipeng Zheng, Qi Zhang, Bei Liu, Qin Jin, Jianlong Fu

In this paper we provide the technique report of Ego4D natural language query challenge in CVPR 2022.

Video Understanding

Paper
Add Code

Searching the Search Space of Vision Transformer

2 code implementations • NeurIPS 2021 • Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling

Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures.

Neural Architecture Search object-detection +4

1,563

Paper
Code

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

1 code implementation • CVPR 2022 • Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Ranked #16 on Video Retrieval on MSR-VTT

Retrieval Super-Resolution +4

437

Paper
Code

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

1 code implementation • 19 Oct 2021 • Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

We adopt Transformer as our unified architecture for its strong performance and task-agnostic design.

Text Generation Text-to-Image Generation

Paper
Code

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

1 code implementation • 19 Oct 2021 • Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images.

Paper
Code

Learning Fine-Grained Motion Embedding for Landscape Animation

no code implementations • 6 Sep 2021 • Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

Paper
Add Code

Reference-based Defect Detection Network

no code implementations • 10 Aug 2021 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao

To solve the partial visual confusion issue, we propose to leverage the carried context information of context reference, which is the concentric bigger box of each region proposal, to perform more accurate region classification and regression.

Defect Detection object-detection +2

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +5

Paper
Add Code

Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training

no code implementations • NeurIPS 2021 • Hongwei Xue, Yupan Huang, Bei Liu, Houwen Peng, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this, we propose a fully Transformer visual embedding for VLP to better learn visual relation and further promote inter-modal alignment.

Question Answering Relation +3

Paper
Add Code

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

3 code implementations • CVPR 2021 • Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Ranked #5 on Visual Entailment on SNLI-VE val

Representation Learning Retrieval +3

206

Paper
Code

Interpretable Machine Learning for COVID-19: An Empirical Study on Severity Prediction Task

1 code implementation • 30 Sep 2020 • Han Wu, Wenjie Ruan, Jiangtao Wang, Dingchang Zheng, Bei Liu, Yayuan Gen, Xiangfei Chai, Jian Chen, Kunwei Li, Shaolin Li, Sumi Helal

The black-box nature of machine learning models hinders the deployment of some high-accuracy models in medical diagnosis.

BIG-bench Machine Learning Feature Importance +3

Paper
Code

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

1 code implementation • 2 Apr 2020 • Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu

We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks.

Image-text matching Language Modelling +7

437

Paper
Code

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences

no code implementations • 24 Nov 2019 • Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou

A storyboard is a sequence of images to illustrate a story containing multiple sentences, which has been a key process to create different story products.

Paper
Add Code

Learning Rich Image Region Representation for Visual Question Answering

no code implementations • 29 Oct 2019 • Bei Liu, Zhicheng Huang, Zhaoyang Zeng, Zheyu Chen, Jianlong Fu

We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models.

Language Modelling Question Answering +1

Paper
Add Code

Gastroscopic Panoramic View: Application to Automatic Polyps Detection under Gastroscopy

no code implementations • 19 Oct 2019 • Shi Chenfei, Yan Xue, Chuan Jiang, Hui Tian, Bei Liu

The main contributions of this paper are: firstly, a gastroscopic panorama reconstruction method is developed.

object-detection Object Detection

Paper
Add Code

SMP Challenge: An Overview of Social Media Prediction Challenge 2019

no code implementations • 4 Oct 2019 • Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, Jiebo Luo

In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.)

Multimedia recommendation

Paper
Add Code

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

1 code implementation • 11 Sep 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.

Object object-detection +3

Paper
Code

WSOD2: Learning Bottom-up and Top-down Objectness Distillation forWeakly-supervised Object Detection

no code implementations • ICCV 2019 • Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD)which plays a vital role in relieving human involvement fromobject-level annotations.

Ranked #6 on Weakly Supervised Object Detection on PASCAL VOC 2007

Object object-detection +2

Paper
Add Code

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations • 11 Jul 2019 • Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning

Paper
Add Code

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

3 code implementations • 23 Apr 2018 • Bei Liu, Jianlong Fu, Makoto P. Kato, Masatoshi Yoshikawa

Extensive experiments are conducted with 8K images, among which 1. 5K image are randomly picked for evaluation.

282

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.