Search Results for author: Bo Ren

Found 45 papers, 20 papers with code

CoCGAN: Contrastive Learning for Adversarial Category Text Generation

no code implementations COLING 2022 Xin Sheng, Linli Xu, Yinlong Xu, Changcun Bao, Huang Chen, Bo Ren

The discriminator of CoCGAN discriminates the authenticity of given samples and optimizes a contrastive learning objective to capture both more flexible data-to-class relations and data-to-data relations among training samples.

Contrastive Learning Text Generation

Semantic-Preserving Abstractive Text Summarization with Siamese Generative Adversarial Net

no code implementations Findings (NAACL) 2022 Xin Sheng, Linli Xu, Yinlong Xu, Deqiang Jiang, Bo Ren

We propose a novel siamese generative adversarial net for abstractive text summarization (SSPGAN), which can preserve the main semantics of the source text.

Abstractive Text Summarization

Hierarchical Multi-label Text Classification with Horizontal and Vertical Category Correlations

no code implementations EMNLP 2021 Linli Xu, Sijie Teng, Ruoyu Zhao, Junliang Guo, Chi Xiao, Deqiang Jiang, Bo Ren

Hierarchical multi-label text classification (HMTC) deals with the challenging task where an instance can be assigned to multiple hierarchically structured categories at the same time.

Multi Label Text Classification Multi-Label Text Classification +1

On decoder-only architecture for speech-to-text and large language model integration

no code implementations8 Jul 2023 Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

1 code implementation12 May 2023 Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai

We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.

Contrastive Learning Optical Character Recognition (OCR)

Multi-Space Neural Radiance Fields

no code implementations CVPR 2023 Ze-Xin Yin, Jiaxiong Qiu, Ming-Ming Cheng, Bo Ren

Existing Neural Radiance Fields (NeRF) methods suffer from the existence of reflective objects, often resulting in blurry or distorted rendering.

Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies

1 code implementation CVPR 2023 Bei Gan, Xiujun Shu, Ruizhi Qiao, Haoqian Wu, Keyu Chen, Hanjun Li, Bo Ren

Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations.

Highlight Detection Learning with noisy labels +1

Turning a CLIP Model into a Scene Text Detector

1 code implementation CVPR 2023 Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

Domain Adaptation Scene Text Detection +1

OSAN: A One-Stage Alignment Network To Unify Multimodal Alignment and Unsupervised Domain Adaptation

no code implementations CVPR 2023 Ye Liu, Lingfeng Qiao, Changchong Lu, Di Yin, Chen Lin, Haoyuan Peng, Bo Ren

An intuitive way to handle these two problems is to fulfill these tasks in two separate stages: aligning modalities followed by domain adaptation, or vice versa.

Unsupervised Domain Adaptation

Consistent Depth Prediction for Transparent Object Reconstruction from RGB-D Camera

no code implementations ICCV 2023 Yuxiang Cai, Yifan Zhu, Haiwei Zhang, Bo Ren

We compare the metrics on our dataset and SLAM reconstruction results in both synthetic scenes and real scenes with the previous methods.

Depth Estimation Depth Prediction +2

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

1 code implementation1 Dec 2022 Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen

Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.

Contrastive Learning Representation Learning

SLAN: Self-Locator Aided Network for Cross-Modal Understanding

no code implementations28 Nov 2022 Jiang-Tian Zhai, Qi Zhang, Tong Wu, Xing-Yu Chen, Jiang-Jiang Liu, Bo Ren, Ming-Ming Cheng

By aggregating cross-modal information, the region filter selects key regions and the region adaptor updates their coordinates with text guidance.

Image Retrieval Retrieval

Grafting Pre-trained Models for Multimodal Headline Generation

no code implementations14 Nov 2022 Lingfeng Qiao, Chen Wu, Ye Liu, Haoyuan Peng, Di Yin, Bo Ren

In this paper, we propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.

Headline Generation Language Modelling +1

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning

no code implementations10 Oct 2022 Zhuoxuan Jiang, Lingfeng Qiao, Di Yin, Shanshan Feng, Bo Ren

Recent language generative models are mostly trained on large-scale datasets, while in some real scenarios, the training datasets are often expensive to obtain and would be small-scale.

Headline Generation Informativeness +1

TaCo: Textual Attribute Recognition via Contrastive Learning

no code implementations22 Aug 2022 Chang Nie, Yiqing Hu, Yanqiu Qu, Hao liu, Deqiang Jiang, Bo Ren

To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential.

Attribute Contrastive Learning

VLMAE: Vision-Language Masked Autoencoder

no code implementations19 Aug 2022 Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren

Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.

Language Modelling Question Answering +4

See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval

1 code implementation18 Aug 2022 Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, Xiao Wang

To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM).

Person Retrieval Retrieval +3

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

1 code implementation5 Jul 2022 Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia

Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model.

Image-text matching Knowledge Distillation +7

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

no code implementations4 Jul 2022 Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang Jiang, Bo Ren

In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category.

Scene Segmentation

RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

1 code implementation NAACL 2022 Yuan Liang, Zhuoxuan Jiang, Di Yin, Bo Ren

To further leverage relation information, we introduce a separate event relation prediction task and adopt multi-task learning method to explicitly enhance event extraction performance.

Document-level Event Extraction Event Extraction +3

Contrastive Graph Multimodal Model for Text Classification in Videos

no code implementations6 Jun 2022 Ye Liu, Changchong Lu, Chen Lin, Di Yin, Bo Ren

However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing.

Contrastive Learning Optical Character Recognition (OCR) +2

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

no code implementations22 May 2022 Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu

The task of Grammatical Error Correction (GEC) has received remarkable attention with wide applications in Natural Language Processing (NLP) in recent years.

Grammatical Error Correction Sentence

Relational Representation Learning in Visually-Rich Documents

no code implementations5 May 2022 Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren

To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views.

Contrastive Learning Key Information Extraction +3

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training

no code implementations18 Apr 2022 Hao liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren

The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data.

Interactive Style Transfer: All is Your Palette

no code implementations25 Mar 2022 Zheng Lin, Zhao Zhang, Kang-Rui Zhang, Bo Ren, Ming-Ming Cheng

Our IST method can serve as a brush, dip style from anywhere, and then paint to any region of the target content image.

Style Transfer

Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

1 code implementation8 Jan 2022 Helei Qiu, Biao Hou, Bo Ren, Xiaohua Zhang

And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames.

Action Recognition Skeleton Based Action Recognition

HybridCR: Weakly-Supervised 3D Point Cloud Semantic Segmentation via Hybrid Contrastive Regularization

1 code implementation CVPR 2022 Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui Lin, Lizhuang Ma

To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart.

Semantic Segmentation Semantic Similarity +1

Neural Collaborative Graph Machines for Table Structure Recognition

no code implementations CVPR 2022 Hao liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren

We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases.

Table Recognition

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition

1 code implementation CVPR 2022 Hao liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang, Bo Ren

For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks.

object-detection Object Detection +1

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

2 code implementations21 Nov 2021 Zhonghua Li, Biao Hou, Zitong Wu, Licheng Jiao, Bo Ren, Chen Yang

We convert a lightweight FCOSR model to TensorRT format, which achieves 73. 93 mAP on DOTA1. 0 at a speed of 10. 68 FPS on Jetson Xavier NX with single scale.

object-detection Object Detection +1

Transfusion: A Novel SLAM Method Focused on Transparent Objects

no code implementations ICCV 2021 Yifan Zhu, Jiaxiong Qiu, Bo Ren

In this paper, we propose a novel SLAM approach called transfusion that allows transparent object existence and recovery in the video input.

Transparent objects

EDN: Salient Object Detection via Extremely-Downsampled Network

1 code implementation24 Dec 2020 Yu-Huan Wu, Yun Liu, Le Zhang, Ming-Ming Cheng, Bo Ren

In this paper, we tap into this gap and show that enhancing high- level features is essential for SOD as well.

Object object-detection +3

PuzzleNet: Scene Text Detection by Segment Context Graph Learning

no code implementations26 Feb 2020 Hao Liu, Antai Guo, Deqiang Jiang, Yiqing Hu, Bo Ren

Recently, a series of decomposition-based scene text detection methods has achieved impressive progress by decomposing challenging text regions into pieces and linking them in a bottom-up manner.

Graph Learning Scene Text Detection +1

Scoot: A Perceptual Metric for Facial Sketches

1 code implementation ICCV 2019 Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L. Rosin, Rongrong Ji

In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics.

Face Sketch Synthesis SSIM

Sequence-based Person Attribute Recognition with Joint CTC-Attention Model

no code implementations20 Nov 2018 Hao Liu, Jingjing Wu, Jianguo Jiang, Meibin Qi, Bo Ren

Attribute recognition has become crucial because of its wide applications in many computer vision tasks, such as person re-identification.

Attribute Object Recognition +1

Enhanced-alignment Measure for Binary Foreground Map Evaluation

2 code implementations26 May 2018 Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji

The existing binary foreground map (FM) measures to address various types of errors in either pixel-wise or structural ways.

Face Sketch Synthesis Style Similarity:A New Structure Co-occurrence Texture Measure

1 code implementation9 Apr 2018 Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Ming-Ming Cheng, Bo Ren, Rongrong Ji, Paul L. Rosin

However, human perception of the similarity of two sketches will consider both structure and texture as essential factors and is not sensitive to slight ("pixel-level") mismatches.

Face Sketch Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.