Search Results for author: Yuhao Zhu

Found 32 papers, 17 papers with code

Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator

no code implementations14 Apr 2024 Abhishek Tyagi, Reiley Jeyapaul, Chuteng Zhu, Paul Whatmough, Yuhao Zhu

As Neural Processing Units (NPU) or accelerators are increasingly deployed in a variety of applications including safety critical applications such as autonomous vehicle, and medical imaging, it is critical to understand the fault-tolerance nature of the NPUs.

Autonomous Vehicles Navigate

Autonomy 2.0: The Quest for Economies of Scale

no code implementations8 Jul 2023 Shuang Wu, Bo Yu, Shaoshan Liu, Yuhao Zhu

With the advancement of robotics and AI technologies in the past decade, we have now entered the age of autonomous machines.

Autonomous Vehicles

Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

no code implementations5 Dec 2022 Abhishek Tyagi, Yiming Gan, Shaoshan Liu, Bo Yu, Paul Whatmough, Yuhao Zhu

As Deep Neural Networks (DNNs) are increasingly deployed in safety critical and privacy sensitive applications such as autonomous driving and biometric authentication, it is critical to understand the fault-tolerance nature of DNNs.

Autonomous Driving

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

1 code implementation30 Aug 2022 Cong Guo, Chen Zhang, Jingwen Leng, Zihan Liu, Fan Yang, Yunxin Liu, Minyi Guo, Yuhao Zhu

In this work, we propose a fixed-length adaptive numerical data type called ANT to achieve low-bit quantization with tiny hardware overheads.

Quantization

Perturbation Inactivation Based Adversarial Defense for Face Recognition

1 code implementation13 Jul 2022 Min Ren, Yuhao Zhu, Yunlong Wang, Zhenan Sun

A straightforward approach is to inactivate the adversarial perturbations so that they can be easily handled as general perturbations.

Adversarial Attack Adversarial Defense +1

SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation

1 code implementation ICLR 2022 Cong Guo, Yuxian Qiu, Jingwen Leng, Xiaotian Gao, Chen Zhang, Yunxin Liu, Fan Yang, Yuhao Zhu, Minyi Guo

This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.

Data Free Quantization

Block-Skim: Efficient Question Answering for Transformer

1 code implementation16 Dec 2021 Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin, Minyi Guo, Yuhao Zhu

We further prune the hidden states corresponding to the unnecessary positions early in lower layers, achieving significant inference-time speedup.

Extractive Question-Answering Question Answering

Dataflow Accelerator Architecture for Autonomous Machine Computing

no code implementations15 Sep 2021 Shaoshan Liu, Yuhao Zhu, Bo Yu, Jean-Luc Gaudiot, Guang R. Gao

Commercial autonomous machines is a thriving sector, one that is likely the next ubiquitous computing platform, after Personal Computers (PC), cloud computing, and mobile computing.

Cloud Computing

S2TA: Exploiting Structured Sparsity for Energy-Efficient Mobile CNN Acceleration

no code implementations16 Jul 2021 Zhi-Gang Liu, Paul N. Whatmough, Yuhao Zhu, Matthew Mattina

We propose to exploit structured sparsity, more specifically, Density Bound Block (DBB) sparsity for both weights and activations.

One Shot Face Swapping on Megapixels

1 code implementation CVPR 2021 Yuhao Zhu, Qi Li, Jian Wang, Chengzhong Xu, Zhenan Sun

Extensive experiments demonstrate the superiority of MegaFS and the first megapixel level face swapping database is released for research on DeepFake detection and face image editing in the public domain.

DeepFake Detection Disentanglement +2

Fast and Accurate: Video Enhancement using Sparse Depth

no code implementations15 Mar 2021 Yu Feng, Patrick Hansen, Paul N. Whatmough, Guoyu Lu, Yuhao Zhu

This paper presents a general framework to build fast and accurate algorithms for video enhancement tasks such as super-resolution, deblurring, and denoising.

Deblurring Denoising +4

Block Skim Transformer for Efficient Question Answering

no code implementations1 Jan 2021 Yue Guan, Jingwen Leng, Yuhao Zhu, Minyi Guo

Following this idea, we proposed Block Skim Transformer (BST) to improve and accelerate the processing of transformer QA models.

Language Modelling Model Compression +1

Eudoxus: Characterizing and Accelerating Localization in Autonomous Machines

no code implementations2 Dec 2020 Yiming Gan, Yu Bo, Boyuan Tian, Leimeng Xu, Wei Hu, Shaoshan Liu, Qiang Liu, Yanjun Zhang, Jie Tang, Yuhao Zhu

We develop and commercialize autonomous machines, such as logistic robots and self-driving cars, around the globe.

Self-Driving Cars Hardware Architecture

End-to-End Framework for Efficient Deep Learning Using Metasurfaces Optics

1 code implementation23 Nov 2020 Carlos Mauricio Villegas Burgos, Tianqi Yang, Nick Vamivakas, Yuhao Zhu

Deep learning using Convolutional Neural Networks (CNNs) has been shown to significantly out-performed many conventional vision algorithms.

A Survey of FPGA-Based Robotic Computing

no code implementations13 Sep 2020 Zishen Wan, Bo Yu, Thomas Yuang Li, Jie Tang, Yuhao Zhu, Yu Wang, Arijit Raychowdhury, Shaoshan Liu

On the other hand, FPGA-based robotic accelerators are becoming increasingly competitive alternatives, especially in latency-critical and power-limited scenarios.

Autonomous Vehicles

Mesorasi: Architecture Support for Point Cloud Analytics via Delayed-Aggregation

1 code implementation16 Aug 2020 Yu Feng, Boyuan Tian, Tiancheng Xu, Paul Whatmough, Yuhao Zhu

Point cloud analytics is poised to become a key workload on battery-powered embedded and mobile platforms in a wide range of emerging application domains, such as autonomous driving, robotics, and augmented reality, where efficiency is paramount.

Autonomous Driving

Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration

no code implementations18 Feb 2020 Cong Guo, Yangjie Zhou, Jingwen Leng, Yuhao Zhu, Zidong Du, Quan Chen, Chao Li, Bin Yao, Minyi Guo

We propose Simultaneous Multi-mode Architecture (SMA), a novel architecture design and execution model that offers general-purpose programmability on DNN accelerators in order to accelerate end-to-end applications.

Tigris: Architecture and Algorithms for 3D Perception in Point Clouds

1 code implementation16 Nov 2019 Tiancheng Xu, Boyuan Tian, Yuhao Zhu

While KD-tree search is inherently sequential, we propose an acceleration-amenable data structure and search algorithm that exposes different forms of parallelism of KD-tree search in the context of point cloud registration.

3D Reconstruction Point Cloud Registration +1

ASV: Accelerated Stereo Vision System

2 code implementations15 Nov 2019 Yu Feng, Paul Whatmough, Yuhao Zhu

The key to ASV is to exploit unique characteristics inherent to stereo vision, and apply stereo-specific optimizations, both algorithmically and computationally.

Stereo Matching

Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-based Approach

1 code implementation CVPR 2020 Haichuan Yang, Shupeng Gui, Yuhao Zhu, Ji Liu

A key parameter that all existing compression techniques are sensitive to is the compression ratio (e. g., pruning sparsity, quantization bitwidth) of each layer.

Neural Network Compression Quantization

Adversarial Defense Through Network Profiling Based Path Extraction

no code implementations CVPR 2019 Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, Yuhao Zhu

Recently, researchers have started decomposing deep neural network models according to their semantics or functions.

Adversarial Defense

Joint Iris Segmentation and Localization Using Deep Multi-task Learning Framework

1 code implementation31 Jan 2019 Caiyong Wang, Yuhao Zhu, Yunfan Liu, Ran He, Zhenan Sun

In this paper, we propose a deep multi-task learning framework, named as IrisParseNet, to exploit the inherent correlations between pupil, iris and sclera to boost up the performance of iris segmentation and localization in a unified model.

Iris Segmentation Multi-Task Learning +1

ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model

2 code implementations CVPR 2019 Haichuan Yang, Yuhao Zhu, Ji Liu

The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint.

Neural Network Compression regression

Recognizing Partial Biometric Patterns

1 code implementation17 Oct 2018 Lingxiao He, Zhenan Sun, Yuhao Zhu, Yunbo Wang

Biometric recognition on partial captured targets is challenging, where only several partial observations of objects are available for matching.

Dictionary Learning Face Recognition +1

SCALE-Sim: Systolic CNN Accelerator

8 code implementations16 Oct 2018 Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, Tushar Krishna

Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications.

Distributed, Parallel, and Cluster Computing Hardware Architecture

Effective Path: Know the Unknowns of Neural Network

no code implementations27 Sep 2018 Yuxian Qiu, Jingwen Leng, Yuhao Zhu, Quan Chen, Chao Li, Minyi Guo

Despite their enormous success, there is still no solid understanding of deep neural network’s working mechanism.

Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking

1 code implementation ICLR 2019 Haichuan Yang, Yuhao Zhu, Ji Liu

Deep Neural Networks (DNNs) are increasingly deployed in highly energy-constrained environments such as autonomous drones and wearable devices while at the same time must operate in real-time.

Euphrates: Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision

no code implementations29 Mar 2018 Yuhao Zhu, Anand Samajdar, Matthew Mattina, Paul Whatmough

Specifically, we propose to expose the motion data that is naturally generated by the Image Signal Processor (ISP) early in the vision pipeline to the CNN engine.

Cloud No Longer a Silver Bullet, Edge to the Rescue

no code implementations15 Feb 2018 Yuhao Zhu, Gu-Yeon Wei, David Brooks

This paper takes the position that, while cognitive computing today relies heavily on the cloud, we will soon see a paradigm shift where cognitive computing primarily happens on network edges.

Position

Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective

no code implementations19 Jan 2018 Yuhao Zhu, Matthew Mattina, Paul Whatmough

Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.