Search Results for author: Bo Ren

Found 45 papers, 20 papers with code

CoCGAN: Contrastive Learning for Adversarial Category Text Generation

no code implementations • COLING 2022 • Xin Sheng, Linli Xu, Yinlong Xu, Changcun Bao, Huang Chen, Bo Ren

The discriminator of CoCGAN discriminates the authenticity of given samples and optimizes a contrastive learning objective to capture both more flexible data-to-class relations and data-to-data relations among training samples.

Contrastive Learning Text Generation

Paper
Add Code

Semantic-Preserving Abstractive Text Summarization with Siamese Generative Adversarial Net

no code implementations • Findings (NAACL) 2022 • Xin Sheng, Linli Xu, Yinlong Xu, Deqiang Jiang, Bo Ren

We propose a novel siamese generative adversarial net for abstractive text summarization (SSPGAN), which can preserve the main semantics of the source text.

Abstractive Text Summarization

Paper
Add Code

Hierarchical Multi-label Text Classification with Horizontal and Vertical Category Correlations

no code implementations • EMNLP 2021 • Linli Xu, Sijie Teng, Ruoyu Zhao, Junliang Guo, Chi Xiao, Deqiang Jiang, Bo Ren

Hierarchical multi-label text classification (HMTC) deals with the challenging task where an instance can be assigned to multiple hierarchically structured categories at the same time.

Multi Label Text Classification Multi-Label Text Classification +1

Paper
Add Code

On decoder-only architecture for speech-to-text and large language model integration

no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

Paper
Add Code

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

1 code implementation • 12 May 2023 • Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai

We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.

Contrastive Learning Optical Character Recognition (OCR)

Paper
Code

Multi-Space Neural Radiance Fields

no code implementations • CVPR 2023 • Ze-Xin Yin, Jiaxiong Qiu, Ming-Ming Cheng, Bo Ren

Existing Neural Radiance Fields (NeRF) methods suffer from the existence of reflective objects, often resulting in blurry or distorted rendering.

Paper
Add Code

Looking Through the Glass: Neural Surface Reconstruction Against High Specular Reflections

1 code implementation • CVPR 2023 • Jiaxiong Qiu, Peng-Tao Jiang, Yifan Zhu, Ze-Xin Yin, Ming-Ming Cheng, Bo Ren

To remedy this issue, we present a novel surface reconstruction framework, NeuS-HSR, based on implicit neural rendering.

Neural Rendering Object +2

Paper
Code

Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies

1 code implementation • CVPR 2023 • Bei Gan, Xiujun Shu, Ruizhi Qiao, Haoqian Wu, Keyu Chen, Hanjun Li, Bo Ren

Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations.

Highlight Detection Learning with noisy labels +1

Paper
Code

Turning a CLIP Model into a Scene Text Detector

1 code implementation • CVPR 2023 • Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

Domain Adaptation Scene Text Detection +1

145

Paper
Code

OSAN: A One-Stage Alignment Network To Unify Multimodal Alignment and Unsupervised Domain Adaptation

no code implementations • CVPR 2023 • Ye Liu, Lingfeng Qiao, Changchong Lu, Di Yin, Chen Lin, Haoyuan Peng, Bo Ren

An intuitive way to handle these two problems is to fulfill these tasks in two separate stages: aligning modalities followed by domain adaptation, or vice versa.

Unsupervised Domain Adaptation

Paper
Add Code

Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera

no code implementations • ICCV 2023 • Yuxiang Cai, Yifan Zhu, Haiwei Zhang, Bo Ren

We compare the metrics on our dataset and SLAM reconstruction results in both synthetic scenes and real scenes with the previous methods.

Depth Estimation Depth Prediction +2

Paper
Add Code

NewsNet: A Novel Dataset for Hierarchical Temporal Segmentation

no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem

Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.

Video Segmentation Video Semantic Segmentation

Paper
Add Code

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation • 1 Dec 2022 • Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

Paper
Code

SLAN: Self-Locator Aided Network for Cross-Modal Understanding

no code implementations • 28 Nov 2022 • Jiang-Tian Zhai, Qi Zhang, Tong Wu, Xing-Yu Chen, Jiang-Jiang Liu, Bo Ren, Ming-Ming Cheng

By aggregating cross-modal information, the region filter selects key regions and the region adaptor updates their coordinates with text guidance.

Image Retrieval Retrieval

Paper
Add Code

Grafting Pre-trained Models for Multimodal Headline Generation

no code implementations • 14 Nov 2022 • Lingfeng Qiao, Chen Wu, Ye Liu, Haoyuan Peng, Di Yin, Bo Ren

In this paper, we propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.

Headline Generation Language Modelling +1

Paper
Add Code

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning

no code implementations • 10 Oct 2022 • Zhuoxuan Jiang, Lingfeng Qiao, Di Yin, Shanshan Feng, Bo Ren

Recent language generative models are mostly trained on large-scale datasets, while in some real scenarios, the training datasets are often expensive to obtain and would be small-scale.

Headline Generation Informativeness +1

Paper
Add Code

TaCo: Textual Attribute Recognition via Contrastive Learning

no code implementations • 22 Aug 2022 • Chang Nie, Yiqing Hu, Yanqiu Qu, Hao liu, Deqiang Jiang, Bo Ren

To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential.

Attribute Contrastive Learning

Paper
Add Code

VLMAE: Vision-Language Masked Autoencoder

no code implementations • 19 Aug 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren

Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.

Language Modelling Question Answering +4

Paper
Add Code

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

1 code implementation • 18 Aug 2022 • Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, Xiao Wang

To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM).

Person Retrieval Retrieval +3

Paper
Code

GMN: Generative Multi-modal Network for Practical Document Information Extraction

no code implementations • NAACL 2022 • Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao liu, Deqiang Jiang, Yinsong Liu, Bo Ren

Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world.

Optical Character Recognition (OCR)

Paper
Add Code

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

1 code implementation • 5 Jul 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia

Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model.

Ranked #1 on Multi-label zero-shot learning on Open Images V4

Image-text matching Knowledge Distillation +7

110

Paper
Code

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

no code implementations • 4 Jul 2022 • Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang Jiang, Bo Ren

In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category.

Scene Segmentation

Paper
Add Code

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

2 code implementations • 22 Jun 2022 • Peixian Chen, Kekai Sheng, Mengdan Zhang, Mingbao Lin, Yunhang Shen, Shaohui Lin, Bo Ren, Ke Li

Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.

Ranked #12 on Open Vocabulary Object Detection on LVIS v1.0

Causal Inference object-detection +1

Paper
Code

RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

1 code implementation • NAACL 2022 • Yuan Liang, Zhuoxuan Jiang, Di Yin, Bo Ren

To further leverage relation information, we introduce a separate event relation prediction task and adopt multi-task learning method to explicitly enhance event extraction performance.

Ranked #1 on Document-level Event Extraction on ChFinAnn

Document-level Event Extraction Event Extraction +3

Paper
Code

Contrastive Graph Multimodal Model for Text Classification in Videos

no code implementations • 6 Jun 2022 • Ye Liu, Changchong Lu, Chen Lin, Di Yin, Bo Ren

However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing.

Contrastive Learning Optical Character Recognition (OCR) +2

Paper
Add Code

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

no code implementations • 22 May 2022 • Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu

The task of Grammatical Error Correction (GEC) has received remarkable attention with wide applications in Natural Language Processing (NLP) in recent years.

Grammatical Error Correction Sentence

Paper
Add Code

Scene Consistency Representation Learning for Video Scene Segmentation

1 code implementation • CVPR 2022 • Haoqian Wu, Keyu Chen, Yanan Luo, Ruizhi Qiao, Bo Ren, Haozhe Liu, Weicheng Xie, Linlin Shen

Additionally, we suggest a more fair and reasonable benchmark to evaluate the performance of Video Scene Segmentation methods.

Data Augmentation Inductive Bias +3

Paper
Code

Relational Representation Learning in Visually-Rich Documents

no code implementations • 5 May 2022 • Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren

To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views.

Contrastive Learning Key Information Extraction +3

Paper
Add Code

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

no code implementations • 18 Apr 2022 • Hao liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data.

Paper
Add Code

Knowledge Mining with Scene Text for Fine-Grained Recognition

1 code implementation • CVPR 2022 • Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao liu, Bo Ren, Xiang Bai, Wenyu Liu

Recently, the semantics of scene text has been proven to be essential in fine-grained image classification.

Activity Recognition Classification +1

Paper
Code

Interactive Style Transfer: All is Your Palette

no code implementations • 25 Mar 2022 • Zheng Lin, Zhao Zhang, Kang-Rui Zhang, Bo Ren, Ming-Ming Cheng

Our IST method can serve as a brush, dip style from anywhere, and then paint to any region of the target content image.

Style Transfer

Paper
Add Code

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

1 code implementation • 8 Jan 2022 • Helei Qiu, Biao Hou, Bo Ren, Xiaohua Zhang

And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames.

Action Recognition Skeleton Based Action Recognition

Paper
Code

HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization

1 code implementation • CVPR 2022 • Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui Lin, Lizhuang Ma

To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart.

Semantic Segmentation Semantic Similarity +1

Paper
Code

Head and Body: Unified Detector and Graph Network for Person Search in Media

no code implementations • 27 Nov 2021 • Xiujun Shu, Yusheng Tao, Ruizhi Qiao, Bo Ke, Wei Wen, Bo Ren

It is by far the largest dataset for person search in media.

Person Search

Paper
Add Code

Neural Collaborative Graph Machines for Table Structure Recognition

no code implementations • CVPR 2022 • Hao liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren

We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases.

Ranked #6 on Table Recognition on PubTabNet

Table Recognition

Paper
Add Code

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

1 code implementation • CVPR 2022 • Hao liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang, Bo Ren

For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks.

object-detection Object Detection +1

Paper
Code

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

2 code implementations • 21 Nov 2021 • Zhonghua Li, Biao Hou, Zitong Wu, Licheng Jiao, Bo Ren, Chen Yang

We convert a lightweight FCOSR model to TensorRT format, which achieves 73. 93 mAP on DOTA1. 0 at a speed of 10. 68 FPS on Jetson Xavier NX with single scale.

object-detection Object Detection +1

Paper
Code

Interpreting BERT architecture predictions for peptide presentation by MHC class I proteins

1 code implementation • 13 Nov 2021 • Hans-Christof Gasser, Georges Bedran, Bo Ren, David Goodlett, Javier Alfaro, Ajitha Rajan

In particular, we find that amino acids close to the peptides' N- and C-terminals are highly relevant.

MHC presentation prediction

Paper
Code

Transfusion: A Novel SLAM Method Focused on Transparent Objects

no code implementations • ICCV 2021 • Yifan Zhu, Jiaxiong Qiu, Bo Ren

In this paper, we propose a novel SLAM approach called transfusion that allows transparent object existence and recovery in the video input.

Transparent objects

Paper
Add Code

EDN: Salient Object Detection via Extremely-Downsampled Network

1 code implementation • 24 Dec 2020 • Yu-Huan Wu, Yun Liu, Le Zhang, Ming-Ming Cheng, Bo Ren

In this paper, we tap into this gap and show that enhancing high- level features is essential for SOD as well.

Object object-detection +3

Paper
Code

PuzzleNet: Scene Text Detection by Segment Context Graph Learning

no code implementations • 26 Feb 2020 • Hao Liu, Antai Guo, Deqiang Jiang, Yiqing Hu, Bo Ren

Recently, a series of decomposition-based scene text detection methods has achieved impressive progress by decomposing challenging text regions into pieces and linking them in a bottom-up manner.

Graph Learning Scene Text Detection +1

Paper
Add Code

Scoot: A Perceptual Metric for Facial Sketches

1 code implementation • ICCV 2019 • Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L. Rosin, Rongrong Ji

In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics.

Face Sketch Synthesis SSIM

Paper
Code

Sequence-based Person Attribute Recognition with Joint CTC-Attention Model

no code implementations • 20 Nov 2018 • Hao Liu, Jingjing Wu, Jianguo Jiang, Meibin Qi, Bo Ren

Attribute recognition has become crucial because of its wide applications in many computer vision tasks, such as person re-identification.

Attribute Object Recognition +1

Paper
Add Code

Enhanced-alignment Measure for Binary Foreground Map Evaluation

2 code implementations • 26 May 2018 • Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji

The existing binary foreground map (FM) measures to address various types of errors in either pixel-wise or structural ways.

Paper
Code

Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

1 code implementation • 9 Apr 2018 • Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Ming-Ming Cheng, Bo Ren, Rongrong Ji, Paul L. Rosin

However, human perception of the similarity of two sketches will consider both structure and texture as essential factors and is not sensitive to slight ("pixel-level") mismatches.

Face Sketch Synthesis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.