1 code implementation • 22 Jan 2024 • Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo
Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.
no code implementations • 17 Jan 2024 • Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo
This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.
no code implementations • 17 Jan 2024 • Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo
Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.
no code implementations • 13 Oct 2023 • Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson
There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?
1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.
1 code implementation • 4 Oct 2023 • Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson
A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training.
no code implementations • 1 Oct 2023 • Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu
This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.
1 code implementation • 14 Sep 2023 • Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng
We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.
no code implementations • 17 Apr 2023 • Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo
In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.
2 code implementations • NeurIPS 2023 • Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson
We show that fast ways of bounding the Lipschitz constant for conventional ResNets are loose, and show how to address this by designing a new residual block, leading to the \emph{Linear ResNet} (LiResNet) architecture.
2 code implementations • CVPR 2023 • Fangyi Chen, Han Zhang, Kai Hu, Yu-Kai Huang, Chenchen Zhu, Marios Savvides
This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage.
Ranked #10 on Object Detection on COCO 2017 val
no code implementations • 26 Nov 2022 • Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou
To achieve this task, we construct a synthetic dataset and develop an effective framework.
no code implementations • 20 Oct 2022 • Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, Mingxuan Wang
This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation.
no code implementations • 6 Jul 2022 • Yansong Li, Kai Hu, Kohei Nakajima, Yongping Pan
Echo state network (ESN), a kind of recurrent neural networks, consists of a fixed reservoir in which neurons are connected randomly and recursively and obtains the desired output only by training output connection weights.
no code implementations • 28 May 2022 • Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu
In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy.
no code implementations • 25 May 2021 • WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo
In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.
1 code implementation • CVPR 2022 • Kai Hu, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn
Text-to-image synthesis (T2I) aims to generate photo-realistic images which are semantically consistent with the text descriptions.
no code implementations • ICCV 2021 • Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen
To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.
1 code implementation • NeurIPS 2020 • Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj
In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.
no code implementations • 7 Apr 2020 • Shuo Tian, Lianhua Qu, Kai Hu, Nan Li, Lei Wang, Weixia Xu
By exploring the design space in network architectures and parameters, recent works have demonstrated great potential for improving the accuracy of LSM model with low complexity.
no code implementations • 18 Nov 2019 • Kai Hu, Barnabas Poczos
We further use a noise analysis method to interpret the difference between RotationOut and Dropout in co-adaptation reduction.
no code implementations • 19 Nov 2018 • Kai Hu, Bhiksha Raj
Capturing spatiotemporal dynamics is an essential topic in video recognition.
no code implementations • 4 Nov 2018 • Kai Hu, Zhijian Ou, Min Hu, Junlan Feng
Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling.
no code implementations • 12 Jun 2018 • Xiaoteng Zhang, Yixin Bao, Feiyun Zhang, Kai Hu, Yicheng Wang, Liang Zhu, Qinzhu He, Yining Lin, Jie Shao, Yao Peng
We also propose new non-local-based models for further improvement on the recognition accuracy.
no code implementations • 12 Jan 2018 • Fen Xiao, Wenzheng Deng, Liangchan Peng, Chunhong Cao, Kai Hu, Xieping Gao
Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision.