Search Results for author: Kai Hu

Found 25 papers, 8 papers with code

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

1 code implementation22 Jan 2024 Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

Document Layout Analysis Document Summarization +4

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

no code implementations17 Jan 2024 Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.

Key Information Extraction Relation

Dynamic Relation Transformer for Contextual Text Block Detection

no code implementations17 Jan 2024 Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.

Graph Generation Relation +1

Is Certifying $\ell_p$ Robustness Still Worthwhile?

no code implementations13 Oct 2023 Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson

There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?

A Recipe for Improved Certifiable Robustness: Capacity and Data

1 code implementation4 Oct 2023 Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson

A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training.

Data Augmentation

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations1 Oct 2023 Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation14 Sep 2023 Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +3

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

no code implementations17 Apr 2023 Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.

Question Answering

Unlocking Deterministic Robustness Certification on ImageNet

2 code implementations NeurIPS 2023 Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson

We show that fast ways of bounding the Lipschitz constant for conventional ResNets are loose, and show how to address this by designing a new residual block, leading to the \emph{Linear ResNet} (LiResNet) architecture.

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

2 code implementations CVPR 2023 Fangyi Chen, Han Zhang, Kai Hu, Yu-Kai Huang, Chenchen Zhu, Marios Savvides

This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage.

Attribute Object +2

Contextual Expressive Text-to-Speech

no code implementations26 Nov 2022 Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis

The VolcTrans System for WMT22 Multilingual Machine Translation Task

no code implementations20 Oct 2022 Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, Mingxuan Wang

This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation.

Machine Translation Translation

Composite FORCE learning of chaotic echo state networks for time-series prediction

no code implementations6 Jul 2022 Yansong Li, Kai Hu, Kohei Nakajima, Yongping Pan

Echo state network (ESN), a kind of recurrent neural networks, consists of a fixed reservoir in which neurons are connected randomly and recursively and obtains the desired output only by training output connection weights.

Time Series Time Series Prediction

Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

no code implementations28 May 2022 Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy.

Blind Face Restoration Super-Resolution

ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents

no code implementations25 May 2021 WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.

Image Segmentation Key Information Extraction +4

Text to Image Generation with Semantic-Spatial Aware GAN

1 code implementation CVPR 2022 Kai Hu, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn

Text-to-image synthesis (T2I) aims to generate photo-realistic images which are semantically consistent with the text descriptions.

Sentence Sentence Embedding +2

Contrast and Order Representations for Video Self-Supervised Learning

no code implementations ICCV 2021 Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen

To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.

Action Recognition Self-Supervised Learning

Is normalization indispensable for training deep neural network?

1 code implementation NeurIPS 2020 Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj

In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.

General Classification Image Classification +5

A Neural Architecture Search based Framework for Liquid State Machine Design

no code implementations7 Apr 2020 Shuo Tian, Lianhua Qu, Kai Hu, Nan Li, Lei Wang, Weixia Xu

By exploring the design space in network architectures and parameters, recent works have demonstrated great potential for improving the accuracy of LSM model with low complexity.

Neural Architecture Search

RotationOut as a Regularization Method for Neural Network

no code implementations18 Nov 2019 Kai Hu, Barnabas Poczos

We further use a noise analysis method to interpret the difference between RotationOut and Dropout in co-adaptation reduction.

Higher-order Network for Action Recognition

no code implementations19 Nov 2018 Kai Hu, Bhiksha Raj

Capturing spatiotemporal dynamics is an essential topic in video recognition.

Action Recognition General Classification +2

Neural CRF transducers for sequence labeling

no code implementations4 Nov 2018 Kai Hu, Zhijian Ou, Min Hu, Junlan Feng

Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling.

Chunking NER +2

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

no code implementations12 Jan 2018 Fen Xiao, Wenzheng Deng, Liangchan Peng, Chunhong Cao, Kai Hu, Xieping Gao

Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision.

Object object-detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.