no code implementations • COLING 2022 • Xin Sheng, Linli Xu, Yinlong Xu, Changcun Bao, Huang Chen, Bo Ren
The discriminator of CoCGAN discriminates the authenticity of given samples and optimizes a contrastive learning objective to capture both more flexible data-to-class relations and data-to-data relations among training samples.
no code implementations • EMNLP 2021 • Linli Xu, Sijie Teng, Ruoyu Zhao, Junliang Guo, Chi Xiao, Deqiang Jiang, Bo Ren
Hierarchical multi-label text classification (HMTC) deals with the challenging task where an instance can be assigned to multiple hierarchically structured categories at the same time.
Multi Label Text Classification Multi-Label Text Classification +1
no code implementations • Findings (NAACL) 2022 • Xin Sheng, Linli Xu, Yinlong Xu, Deqiang Jiang, Bo Ren
We propose a novel siamese generative adversarial net for abstractive text summarization (SSPGAN), which can preserve the main semantics of the source text.
no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu
Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.
1 code implementation • 12 May 2023 • Jianfeng Kuang, Wei Hua, Dingkang Liang, Mingkun Yang, Deqiang Jiang, Bo Ren, Xiang Bai
We evaluate the existing end-to-end methods for VIE on the proposed dataset and observe that the performance of these methods has a distinguishable drop from SROIE (a widely used English dataset) to our proposed dataset due to the larger variance of layout and entities.
no code implementations • CVPR 2023 • Ze-Xin Yin, Jiaxiong Qiu, Ming-Ming Cheng, Bo Ren
Existing Neural Radiance Fields (NeRF) methods suffer from the existence of reflective objects, often resulting in blurry or distorted rendering.
1 code implementation • CVPR 2023 • Jiaxiong Qiu, Peng-Tao Jiang, Yifan Zhu, Ze-Xin Yin, Ming-Ming Cheng, Bo Ren
To remedy this issue, we present a novel surface reconstruction framework, NeuS-HSR, based on implicit neural rendering.
1 code implementation • CVPR 2023 • Bei Gan, Xiujun Shu, Ruizhi Qiao, Haoqian Wu, Keyu Chen, Hanjun Li, Bo Ren
Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations.
1 code implementation • CVPR 2023 • Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai
Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.
no code implementations • CVPR 2023 • Haoqian Wu, Keyu Chen, Haozhe Liu, Mingchen Zhuge, Bing Li, Ruizhi Qiao, Xiujun Shu, Bei Gan, Liangsheng Xu, Bo Ren, Mengmeng Xu, Wentian Zhang, Raghavendra Ramachandra, Chia-Wen Lin, Bernard Ghanem
Temporal video segmentation is the get-to-go automatic video analysis, which decomposes a long-form video into smaller components for the following-up understanding tasks.
no code implementations • ICCV 2023 • Yuxiang Cai, Yifan Zhu, Haiwei Zhang, Bo Ren
We compare the metrics on our dataset and SLAM reconstruction results in both synthetic scenes and real scenes with the previous methods.
no code implementations • CVPR 2023 • Ye Liu, Lingfeng Qiao, Changchong Lu, Di Yin, Chen Lin, Haoyuan Peng, Bo Ren
An intuitive way to handle these two problems is to fulfill these tasks in two separate stages: aligning modalities followed by domain adaptation, or vice versa.
1 code implementation • 1 Dec 2022 • Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie Yang, Chunhua Shen
Most existing methods focus on learning noise-robust models from web images while neglecting the performance drop caused by the differences between web domain and real-world domain.
no code implementations • 28 Nov 2022 • Jiang-Tian Zhai, Qi Zhang, Tong Wu, Xing-Yu Chen, Jiang-Jiang Liu, Bo Ren, Ming-Ming Cheng
By aggregating cross-modal information, the region filter selects key regions and the region adaptor updates their coordinates with text guidance.
no code implementations • 14 Nov 2022 • Lingfeng Qiao, Chen Wu, Ye Liu, Haoyuan Peng, Di Yin, Bo Ren
In this paper, we propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.
no code implementations • 10 Oct 2022 • Zhuoxuan Jiang, Lingfeng Qiao, Di Yin, Shanshan Feng, Bo Ren
Recent language generative models are mostly trained on large-scale datasets, while in some real scenarios, the training datasets are often expensive to obtain and would be small-scale.
no code implementations • 22 Aug 2022 • Chang Nie, Yiqing Hu, Yanqiu Qu, Hao liu, Deqiang Jiang, Bo Ren
To realize this goal, we design the learning paradigm from three perspectives: 1) generating attribute views, 2) extracting subtle but crucial details, and 3) exploiting valued view pairs for learning, to fully unlock the pre-training potential.
no code implementations • 19 Aug 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren
Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.
1 code implementation • 18 Aug 2022 • Xiujun Shu, Wei Wen, Haoqian Wu, Keyu Chen, Yiran Song, Ruizhi Qiao, Bo Ren, Xiao Wang
To explore the fine-grained alignment, we further propose two implicit semantic alignment paradigms: multi-level alignment (MLA) and bidirectional mask modeling (BMM).
no code implementations • NAACL 2022 • Haoyu Cao, Jiefeng Ma, Antai Guo, Yiqing Hu, Hao liu, Deqiang Jiang, Yinsong Liu, Bo Ren
Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world.
1 code implementation • 5 Jul 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia
Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model.
Ranked #1 on Multi-label zero-shot learning on Open Images V4
no code implementations • 4 Jul 2022 • Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang Jiang, Bo Ren
In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category.
2 code implementations • 22 Jun 2022 • Peixian Chen, Kekai Sheng, Mengdan Zhang, Mingbao Lin, Yunhang Shen, Shaohui Lin, Bo Ren, Ke Li
Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.
Ranked #12 on Open Vocabulary Object Detection on LVIS v1.0
1 code implementation • NAACL 2022 • Yuan Liang, Zhuoxuan Jiang, Di Yin, Bo Ren
To further leverage relation information, we introduce a separate event relation prediction task and adopt multi-task learning method to explicitly enhance event extraction performance.
Ranked #1 on Document-level Event Extraction on ChFinAnn
no code implementations • 6 Jun 2022 • Ye Liu, Changchong Lu, Chen Lin, Di Yin, Bo Ren
However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing.
no code implementations • 22 May 2022 • Jiquan Li, Junliang Guo, Yongxin Zhu, Xin Sheng, Deqiang Jiang, Bo Ren, Linli Xu
The task of Grammatical Error Correction (GEC) has received remarkable attention with wide applications in Natural Language Processing (NLP) in recent years.
1 code implementation • CVPR 2022 • Haoqian Wu, Keyu Chen, Yanan Luo, Ruizhi Qiao, Bo Ren, Haozhe Liu, Weicheng Xie, Linlin Shen
Additionally, we suggest a more fair and reasonable benchmark to evaluate the performance of Video Scene Segmentation methods.
no code implementations • 5 May 2022 • Xin Li, Yan Zheng, Yiqing Hu, Haoyu Cao, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Bo Ren
To deal with the unpredictable definition of relations, we propose a novel contrastive learning task named Relational Consistency Modeling (RCM), which harnesses the fact that existing relations should be consistent in differently augmented positive views.
no code implementations • 18 Apr 2022 • Hao liu, Xinghua Jiang, Xin Li, Antai Guo, Deqiang Jiang, Bo Ren
The self-supervised Masked Image Modeling (MIM) schema, following "mask-and-reconstruct" pipeline of recovering contents from masked image, has recently captured the increasing interest in the multimedia community, owing to the excellent ability of learning visual representation from unlabeled data.
1 code implementation • CVPR 2022 • Hao Wang, Junchao Liao, Tianheng Cheng, Zewen Gao, Hao liu, Bo Ren, Xiang Bai, Wenyu Liu
Recently, the semantics of scene text has been proven to be essential in fine-grained image classification.
no code implementations • 25 Mar 2022 • Zheng Lin, Zhao Zhang, Kang-Rui Zhang, Bo Ren, Ming-Ming Cheng
Our IST method can serve as a brush, dip style from anywhere, and then paint to any region of the target content image.
1 code implementation • 8 Jan 2022 • Helei Qiu, Biao Hou, Bo Ren, Xiaohua Zhang
And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames.
1 code implementation • CVPR 2022 • Mengtian Li, Yuan Xie, Yunhang Shen, Bo Ke, Ruizhi Qiao, Bo Ren, Shaohui Lin, Lizhuang Ma
To address the huge labeling cost in large-scale point cloud semantic segmentation, we propose a novel hybrid contrastive regularization (HybridCR) framework in weakly-supervised setting, which obtains competitive performance compared to its fully-supervised counterpart.
no code implementations • 27 Nov 2021 • Xiujun Shu, Yusheng Tao, Ruizhi Qiao, Bo Ke, Wei Wen, Bo Ren
It is by far the largest dataset for person search in media.
no code implementations • CVPR 2022 • Hao liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren
We also show that the proposed NCGM can modulate collaborative pattern of different modalities conditioned on the context of intra-modality cues, which is vital for diversified table cases.
Ranked #7 on Table Recognition on PubTabNet
1 code implementation • CVPR 2022 • Hao liu, Xinghua Jiang, Xin Li, Zhimin Bao, Deqiang Jiang, Bo Ren
For the sake of trade-off between efficiency and performance, a group of works merely perform SA operation within local patches, whereas the global contextual information is abandoned, which would be indispensable for visual recognition tasks.
2 code implementations • 21 Nov 2021 • Zhonghua Li, Biao Hou, Zitong Wu, Licheng Jiao, Bo Ren, Chen Yang
We convert a lightweight FCOSR model to TensorRT format, which achieves 73. 93 mAP on DOTA1. 0 at a speed of 10. 68 FPS on Jetson Xavier NX with single scale.
1 code implementation • 13 Nov 2021 • Hans-Christof Gasser, Georges Bedran, Bo Ren, David Goodlett, Javier Alfaro, Ajitha Rajan
In particular, we find that amino acids close to the peptides' N- and C-terminals are highly relevant.
no code implementations • ICCV 2021 • Yifan Zhu, Jiaxiong Qiu, Bo Ren
In this paper, we propose a novel SLAM approach called transfusion that allows transparent object existence and recovery in the video input.
1 code implementation • 24 Dec 2020 • Yu-Huan Wu, Yun Liu, Le Zhang, Ming-Ming Cheng, Bo Ren
In this paper, we tap into this gap and show that enhancing high- level features is essential for SOD as well.
no code implementations • 26 Feb 2020 • Hao Liu, Antai Guo, Deqiang Jiang, Yiqing Hu, Bo Ren
Recently, a series of decomposition-based scene text detection methods has achieved impressive progress by decomposing challenging text regions into pieces and linking them in a bottom-up manner.
1 code implementation • ICCV 2019 • Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Yun Liu, Ming-Ming Cheng, Bo Ren, Paul L. Rosin, Rongrong Ji
In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics.
no code implementations • 20 Nov 2018 • Hao Liu, Jingjing Wu, Jianguo Jiang, Meibin Qi, Bo Ren
Attribute recognition has become crucial because of its wide applications in many computer vision tasks, such as person re-identification.
2 code implementations • 26 May 2018 • Deng-Ping Fan, Cheng Gong, Yang Cao, Bo Ren, Ming-Ming Cheng, Ali Borji
The existing binary foreground map (FM) measures to address various types of errors in either pixel-wise or structural ways.
1 code implementation • 9 Apr 2018 • Deng-Ping Fan, Shengchuan Zhang, Yu-Huan Wu, Ming-Ming Cheng, Bo Ren, Rongrong Ji, Paul L. Rosin
However, human perception of the similarity of two sketches will consider both structure and texture as essential factors and is not sensitive to slight ("pixel-level") mismatches.