Search Results for author: Kai Hu

Found 25 papers, 8 papers with code

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

1 code implementation • 22 Jan 2024 • Jiawei Wang, Kai Hu, Zhuoyao Zhong, Lei Sun, Qiang Huo

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

Document Layout Analysis Document Summarization +4

Paper
Code

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-like Documents

no code implementations • 17 Jan 2024 • Kai Hu, Jiawei Wang, WeiHong Lin, Zhuoyao Zhong, Lei Sun, Qiang Huo

This unified approach allows for the definition of various relation types and effectively tackles hierarchical relationships in form-like documents.

Key Information Extraction Relation

Paper
Add Code

Dynamic Relation Transformer for Contextual Text Block Detection

no code implementations • 17 Jan 2024 • Jiawei Wang, Shunchi Zhang, Kai Hu, Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

Contextual Text Block Detection (CTBD) is the task of identifying coherent text blocks within the complexity of natural scenes.

Graph Generation Relation +1

Paper
Add Code

Is Certifying $\ell_p$ Robustness Still Worthwhile?

no code implementations • 13 Oct 2023 • Ravi Mangal, Klas Leino, Zifan Wang, Kai Hu, Weicheng Yu, Corina Pasareanu, Anupam Datta, Matt Fredrikson

There are three layers to this inquiry, which we address in this paper: (1) why do we care about robustness research?

Paper
Add Code

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

1 code implementation • 7 Oct 2023 • JiaMing Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

In this paper, we propose LauraGPT, a unified GPT model for audio recognition, understanding, and generation.

Audio captioning Automatic Speech Recognition +11

274

Paper
Code

A Recipe for Improved Certifiable Robustness: Capacity and Data

1 code implementation • 4 Oct 2023 • Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson

A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training.

Data Augmentation

Paper
Code

Completing Visual Objects via Bridging Generation and Segmentation

no code implementations • 1 Oct 2023 • Xiang Li, Yinpeng Chen, Chung-Ching Lin, Hao Chen, Kai Hu, Rita Singh, Bhiksha Raj, Lijuan Wang, Zicheng Liu

This paper presents a novel approach to object completion, with the primary goal of reconstructing a complete object from its partially visible components.

Image Generation Object +1

Paper
Add Code

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

1 code implementation • 14 Sep 2023 • Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

We also demonstrate that the pre-trained models are suitable for downstream tasks, including automatic speech recognition and personalized text-to-speech synthesis.

Automatic Speech Recognition speech-recognition +3

274

Paper
Code

A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images

no code implementations • 17 Apr 2023 • Kai Hu, Zhuoyuan Wu, Zhuoyao Zhong, WeiHong Lin, Lei Sun, Qiang Huo

In this paper, we present a new question-answering (QA) based key-value pair extraction approach, called KVPFormer, to robustly extracting key-value relationships between entities from form-like document images.

Question Answering

Paper
Add Code

Unlocking Deterministic Robustness Certification on ImageNet

2 code implementations • NeurIPS 2023 • Kai Hu, Andy Zou, Zifan Wang, Klas Leino, Matt Fredrikson

We show that fast ways of bounding the Lipschitz constant for conventional ResNets are loose, and show how to address this by designing a new residual block, leading to the \emph{Linear ResNet} (LiResNet) architecture.

Paper
Code

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

2 code implementations • CVPR 2023 • Fangyi Chen, Han Zhang, Kai Hu, Yu-Kai Huang, Chenchen Zhu, Marios Savvides

This paper investigates a phenomenon where query-based object detectors mispredict at the last decoding stage while predicting correctly at an intermediate stage.

Ranked #10 on Object Detection on COCO 2017 val

Attribute Object +2

1,818

Paper
Code

Contextual Expressive Text-to-Speech

no code implementations • 26 Nov 2022 • Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis

Paper
Add Code

The VolcTrans System for WMT22 Multilingual Machine Translation Task

no code implementations • 20 Oct 2022 • Xian Qian, Kai Hu, Jiaqiang Wang, Yifeng Liu, Xingyuan Pan, Jun Cao, Mingxuan Wang

This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation.

Machine Translation Translation

Paper
Add Code

Composite FORCE learning of chaotic echo state networks for time-series prediction

no code implementations • 6 Jul 2022 • Yansong Li, Kai Hu, Kohei Nakajima, Yongping Pan

Echo state network (ESN), a kind of recurrent neural networks, consists of a fixed reservoir in which neurons are connected randomly and recursively and obtains the desired output only by training output connection weights.

Time Series Time Series Prediction

Paper
Add Code

Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

no code implementations • 28 May 2022 • Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

In the asymmetric codec, we adopt a mixed multi-path residual block (MMRB) to gradually extract weak texture features of input images, which can better preserve the original facial features and avoid excessive fantasy.

Blind Face Restoration Super-Resolution

Paper
Add Code

ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents

no code implementations • 25 May 2021 • WeiHong Lin, Qifang Gao, Lei Sun, Zhuoyao Zhong, Kai Hu, Qin Ren, Qiang Huo

In this paper, we propose a new multi-modal backbone network by concatenating a BERTgrid to an intermediate layer of a CNN model, where the input of CNN is a document image and the BERTgrid is a grid of word embeddings, to generate a more powerful grid-based document representation, named ViBERTgrid.

Image Segmentation Key Information Extraction +4

Paper
Add Code

Text to Image Generation with Semantic-Spatial Aware GAN

1 code implementation • CVPR 2022 • Kai Hu, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn

Text-to-image synthesis (T2I) aims to generate photo-realistic images which are semantically consistent with the text descriptions.

Sentence Sentence Embedding +2

169

Paper
Code

Contrast and Order Representations for Video Self-Supervised Learning

no code implementations • ICCV 2021 • Kai Hu, Jie Shao, YuAn Liu, Bhiksha Raj, Marios Savvides, Zhiqiang Shen

To address this, we present a contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames.

Action Recognition Self-Supervised Learning

Paper
Add Code

Is normalization indispensable for training deep neural network?

1 code implementation • NeurIPS 2020 • Jie Shao, Kai Hu, Changhu Wang, xiangyang xue, Bhiksha Raj

In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.

General Classification Image Classification +5

Paper
Code

A Neural Architecture Search based Framework for Liquid State Machine Design

no code implementations • 7 Apr 2020 • Shuo Tian, Lianhua Qu, Kai Hu, Nan Li, Lei Wang, Weixia Xu

By exploring the design space in network architectures and parameters, recent works have demonstrated great potential for improving the accuracy of LSM model with low complexity.

Neural Architecture Search

Paper
Add Code

RotationOut as a Regularization Method for Neural Network

no code implementations • 18 Nov 2019 • Kai Hu, Barnabas Poczos

We further use a noise analysis method to interpret the difference between RotationOut and Dropout in co-adaptation reduction.

Paper
Add Code

Higher-order Network for Action Recognition

no code implementations • 19 Nov 2018 • Kai Hu, Bhiksha Raj

Capturing spatiotemporal dynamics is an essential topic in video recognition.

Action Recognition General Classification +2

Paper
Add Code

Neural CRF transducers for sequence labeling

no code implementations • 4 Nov 2018 • Kai Hu, Zhijian Ou, Min Hu, Junlan Feng

Conditional random fields (CRFs) have been shown to be one of the most successful approaches to sequence labeling.

Chunking NER +2

Paper
Add Code

Qiniu Submission to ActivityNet Challenge 2018

no code implementations • 12 Jun 2018 • Xiaoteng Zhang, Yixin Bao, Feiyun Zhang, Kai Hu, Yicheng Wang, Liang Zhu, Qinzhu He, Yining Lin, Jie Shao, Yao Peng

We also propose new non-local-based models for further improvement on the recognition accuracy.

Activity Recognition Optical Flow Estimation

Paper
Add Code

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

no code implementations • 12 Jan 2018 • Fen Xiao, Wenzheng Deng, Liangchan Peng, Chunhong Cao, Kai Hu, Xieping Gao

Salient object detection is a fundamental problem and has been received a great deal of attentions in computer vision.

Object object-detection +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.