Search Results for author: Bei Liu

Found 35 papers, 15 papers with code

Debiasing Event Understanding for Visual Commonsense Tasks

no code implementations Findings (ACL) 2022 Minji Seo, YeonJoon Jung, Seungtaek Choi, Seung-won Hwang, Bei Liu

We study event understanding as a critical step towards visual commonsense tasks. Meanwhile, we argue that current object-based event understanding is purely likelihood-based, leading to incorrect event prediction, due to biased correlation between events and objects. We propose to mitigate such biases with do-calculus, proposed in causality research, but overcoming its limited robustness, by an optimized aggregation with association-based prediction. We show the effectiveness of our approach, intrinsically by comparing our generated events with ground-truth event annotation, and extrinsically by downstream commonsense tasks.

One-Shot Sensitivity-Aware Mixed Sparsity Pruning for Large Language Models

1 code implementation14 Oct 2023 Hang Shao, Bei Liu, Bo Xiao, Ke Zeng, Guanglu Wan, Yanmin Qian

Various Large Language Models~(LLMs) from the Generative Pretrained Transformer(GPT) family have achieved outstanding performances in a wide range of text generation tasks.

Quantization Text Generation

ViCo: Engaging Video Comment Generation with Human Preference Rewards

no code implementations22 Aug 2023 Yuchong Sun, Bei Liu, Xu Chen, Ruihua Song, Jianlong Fu

Experiments on ViCo-20k show that the comments generated by our ViCo model exhibit the best performance in terms of both quantitative and qualitative results, particularly when engagement is considered.

Caption Generation Comment Generation +1

Improving Diversity in Zero-Shot GAN Adaptation with Semantic Variations

no code implementations ICCV 2023 Seogkyu Jeon, Bei Liu, Pilhyeon Lee, Kibeom Hong, Jianlong Fu, Hyeran Byun

Due to the data absence, the textual description of the target domain and the vision-language models, e. g., CLIP, are utilized to effectively guide the generator.

Revisiting Latent Space of GAN Inversion for Real Image Editing

no code implementations18 Jul 2023 Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem.

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

no code implementations ICCV 2023 Yi-Syuan Chen, Yun-Zhu Song, Cheng Yu Yeo, Bei Liu, Jianlong Fu, Hong-Han Shuai

To this end, we raise a question: ``How can we enable in-context learning without relying on the intrinsic in-context ability of large language models?".

Hallucination In-Context Learning

Transferring Foundation Models for Generalizable Robotic Manipulation

no code implementations9 Jun 2023 Jiange Yang, Wenhui Tan, Chuhao Jin, Keling Yao, Bei Liu, Jianlong Fu, Ruihua Song, Gangshan Wu, LiMin Wang

In this paper, we propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models, to condition robot manipulation tasks.

Imitation Learning Object +1

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

no code implementations31 May 2023 Kai Katsumata, Duc Minh Vo, Bei Liu, Hideki Nakayama

The exploration of the latent space in StyleGANs and GAN inversion exemplify impressive real-world image editing, yet the trade-off between reconstruction quality and editing quality remains an open problem.

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

no code implementations30 May 2023 Chuhao Jin, Wenhui Tan, Jiange Yang, Bei Liu, Ruihua Song, LiMin Wang, Jianlong Fu

We propose a novel framework for learning high-level cognitive capabilities in robot manipulation tasks, such as making a smiley face using building blocks.

Robot Manipulation

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

no code implementations18 May 2023 Hang Shao, Wei Wang, Bei Liu, Xun Gong, Haoyu Wang, Yanmin Qian

Due to the rapid development of computing hardware resources and the dramatic growth of data, pre-trained models in speech recognition, such as Whisper, have significantly improved the performance of speech recognition tasks.

Knowledge Distillation Quantization +2

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

1 code implementation CVPR 2023 Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

To generate joint audio-video pairs, we propose a novel Multi-Modal Diffusion model (i. e., MM-Diffusion), with two-coupled denoising autoencoders.

Denoising FAD +1

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

1 code implementation12 Oct 2022 Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

Large-scale video-language pre-training has shown significant improvement in video-language understanding tasks.

Ranked #2 on Video Retrieval on QuerYD (using extra training data)

Contrastive Learning Question Answering +3

AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

1 code implementation7 Sep 2022 Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu

To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN.

Image Generation

Language-Guided Face Animation by Recurrent StyleGAN-based Generator

1 code implementation11 Aug 2022 Tiankai Hang, Huan Yang, Bei Liu, Jianlong Fu, Xin Geng, Baining Guo

Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames.

Image Manipulation

Exploring Anchor-based Detection for Ego4D Natural Language Query

no code implementations10 Aug 2022 Sipeng Zheng, Qi Zhang, Bei Liu, Qin Jin, Jianlong Fu

In this paper we provide the technique report of Ego4D natural language query challenge in CVPR 2022.

Video Understanding

Searching the Search Space of Vision Transformer

2 code implementations NeurIPS 2021 Minghao Chen, Kan Wu, Bolin Ni, Houwen Peng, Bei Liu, Jianlong Fu, Hongyang Chao, Haibin Ling

Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures.

Neural Architecture Search object-detection +4

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

1 code implementation CVPR 2022 Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Retrieval Super-Resolution +4

Unifying Multimodal Transformer for Bi-directional Image and Text Generation

1 code implementation19 Oct 2021 Yupan Huang, Hongwei Xue, Bei Liu, Yutong Lu

We adopt Transformer as our unified architecture for its strong performance and task-agnostic design.

Text Generation Text-to-Image Generation

A Picture is Worth a Thousand Words: A Unified System for Diverse Captions and Rich Images Generation

1 code implementation19 Oct 2021 Yupan Huang, Bei Liu, Jianlong Fu, Yutong Lu

In this work, we demonstrate such an AI creation system to produce both diverse captions and rich images.

Learning Fine-Grained Motion Embedding for Landscape Animation

no code implementations6 Sep 2021 Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation.

Reference-based Defect Detection Network

no code implementations10 Aug 2021 Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao

To solve the partial visual confusion issue, we propose to leverage the carried context information of context reference, which is the concentric bigger box of each region proposal, to perform more accurate region classification and regression.

Defect Detection object-detection +2

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

3 code implementations CVPR 2021 Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu

As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages.

Representation Learning Retrieval +3

Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers

1 code implementation2 Apr 2020 Zhicheng Huang, Zhaoyang Zeng, Bei Liu, Dongmei Fu, Jianlong Fu

We aim to build a more accurate and thorough connection between image pixels and language semantics directly from image and sentence pairs instead of using region-based image features as the most recent vision and language tasks.

Image-text matching Language Modelling +7

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences

no code implementations24 Nov 2019 Shizhe Chen, Bei Liu, Jianlong Fu, Ruihua Song, Qin Jin, Pingping Lin, Xiaoyu Qi, Chunting Wang, Jin Zhou

A storyboard is a sequence of images to illustrate a story containing multiple sentences, which has been a key process to create different story products.

Learning Rich Image Region Representation for Visual Question Answering

no code implementations29 Oct 2019 Bei Liu, Zhicheng Huang, Zhaoyang Zeng, Zheyu Chen, Jianlong Fu

We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models.

Language Modelling Question Answering +1

Gastroscopic Panoramic View: Application to Automatic Polyps Detection under Gastroscopy

no code implementations19 Oct 2019 Shi Chenfei, Yan Xue, Chuan Jiang, Hui Tian, Bei Liu

The main contributions of this paper are: firstly, a gastroscopic panorama reconstruction method is developed.

object-detection Object Detection

SMP Challenge: An Overview of Social Media Prediction Challenge 2019

no code implementations4 Oct 2019 Bo Wu, Wen-Huang Cheng, Peiye Liu, Bei Liu, Zhaoyang Zeng, Jiebo Luo

In the SMP Challenge at ACM Multimedia 2019, we introduce a novel prediction task Temporal Popularity Prediction, which focuses on predicting future interaction or attractiveness (in terms of clicks, views or likes etc.)

Multimedia recommendation

WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection

1 code implementation11 Sep 2019 Zhaoyang Zeng, Bei Liu, Jianlong Fu, Hongyang Chao, Lei Zhang

We study on weakly-supervised object detection (WSOD) which plays a vital role in relieving human involvement from object-level annotations.

Object object-detection +3

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

no code implementations11 Jul 2019 Shizhe Chen, Yuqing Song, Yida Zhao, Qin Jin, Zhaoyang Zeng, Bei Liu, Jianlong Fu, Alexander Hauptmann

The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9. 91 METEOR score on the challenge testing set.

Dense Captioning Dense Video Captioning

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training

3 code implementations23 Apr 2018 Bei Liu, Jianlong Fu, Makoto P. Kato, Masatoshi Yoshikawa

Extensive experiments are conducted with 8K images, among which 1. 5K image are randomly picked for evaluation.

8k

Cannot find the paper you are looking for? You can Submit a new open access paper.