Search Results for author: Cong Hao

Found 36 papers, 21 papers with code

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

1 code implementation11 Aug 2023 Stefan Abi-Karam, Rishov Sarkar, Dejia Xu, Zhiwen Fan, Zhangyang Wang, Cong Hao

In this work, we introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.

Meta-Learning

Rapid-INR: Storage Efficient CPU-free DNN Training Using Implicit Neural Representation

1 code implementation29 Jun 2023 Hanqiu Chen, Hang Yang, Stephen Fitzmeyer, Cong Hao

Our methodology involves storing the whole dataset directly in INR format on a GPU, mitigating the significant data communication overhead between the CPU and GPU during training.

Image Classification Image Compression +1

Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts

1 code implementation30 May 2023 Rishov Sarkar, Hanxue Liang, Zhiwen Fan, Zhangyang Wang, Cong Hao

Computer vision researchers are embracing two promising paradigms: Vision Transformers (ViTs) and Multi-task Learning (MTL), which both show great performance but are computation-intensive, given the quadratic complexity of self-attention in ViT and the need to activate an entire large MTL model for one task.

Multi-Task Learning

DGNN-Booster: A Generic FPGA Accelerator Framework For Dynamic Graph Neural Network Inference

1 code implementation13 Apr 2023 Hanqiu Chen, Cong Hao

The experiment results demonstrate that DGNN-Booster can achieve a speedup of up to 5. 6x compared to the CPU baseline (6226R), 8. 4x compared to the GPU baseline (A6000) and 2. 1x compared to the FPGA baseline without applying optimizations proposed in this paper.

GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization

1 code implementation29 Mar 2023 Stefan Abi-Karam, Cong Hao

Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework.

Code Generation

Extensible and Efficient Proxy for Neural Architecture Search

no code implementations ICCV 2023 Yuhong Li, Jiajie Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

We further propose a Discrete Proxy Search (DPS) method to find the optimized training settings for Eproxy with only a handful of benchmarked architectures on the target tasks.

Neural Architecture Search

M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design

1 code implementation26 Oct 2022 Hanxue Liang, Zhiwen Fan, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang

However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task.

Multi-Task Learning

Data-Model-Circuit Tri-Design for Ultra-Light Video Intelligence on Edge Devices

no code implementations16 Oct 2022 Yimeng Zhang, Akshay Karkal Kamath, Qiucheng Wu, Zhiwen Fan, Wuyang Chen, Zhangyang Wang, Shiyu Chang, Sijia Liu, Cong Hao

In this paper, we propose a data-model-hardware tri-design framework for high-throughput, low-cost, and high-accuracy multi-object tracking (MOT) on High-Definition (HD) video stream.

Model Compression Multi-Object Tracking

Bottleneck Analysis of Dynamic Graph Neural Network Inference on CPU and GPU

no code implementations8 Oct 2022 Hanqiu Chen, Yahya Alhinai, Yihan Jiang, Eunjee Na, Cong Hao

A variety of dynamic graph neural networks designed from algorithmic perspectives have succeeded in incorporating temporal information into graph processing.

Unsupervised Learning for Combinatorial Optimization with Principled Objective Relaxation

1 code implementation13 Jul 2022 Haoyu Wang, Nan Wu, Hang Yang, Cong Hao, Pan Li

Using machine learning to solve combinatorial optimization (CO) problems is challenging, especially when the data is unlabeled.

Combinatorial Optimization

RT-DNAS: Real-time Constrained Differentiable Neural Architecture Search for 3D Cardiac Cine MRI Segmentation

no code implementations8 Jun 2022 Qing Lu, Xiaowei Xu, Shunjie Dong, Cong Hao, Lei Yang, Cheng Zhuo, Yiyu Shi

Accurately segmenting temporal frames of cine magnetic resonance imaging (MRI) is a crucial step in various real-time MRI guided cardiac interventions.

MRI segmentation Neural Architecture Search

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

no code implementations6 Jun 2022 Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc.

BIG-bench Machine Learning

H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness

1 code implementation29 Apr 2022 Xinyi Zhang, Cong Hao, Peipei Zhou, Alex Jones, Jingtong Hu

The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i. e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns.

Multi-Task Learning

FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference

1 code implementation27 Apr 2022 Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, Cong Hao

First, we propose a novel and scalable dataflow architecture, which generally supports a wide range of GNN models with message-passing mechanism.

Drug Discovery

LOSTIN: Logic Optimization via Spatio-Temporal Information with Hybrid Graph Models

1 code implementation20 Jan 2022 Nan Wu, Jiwon Lee, Yuan Xie, Cong Hao

Despite the stride made by machine learning (ML) based performance modeling, two major concerns that may impede production-ready ML applications in EDA are stringent accuracy requirements and generalization capability.

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

1 code implementation20 Jan 2022 Stefan Abi-Karam, Yuqi He, Rishov Sarkar, Lakshmi Sathidevi, Zihang Qiao, Cong Hao

Second, we aim to support a diverse set of GNN models with the extensibility to flexibly adapt to new models.

Drug Discovery

Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit Translation

no code implementations13 Sep 2021 Nan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao

Pioneering in this direction, we expect more GNN endeavors to revolutionize this high-demand Program-to-Circuit problem and to enrich the expressiveness of GNNs on programs.

Transfer Learning Translation

Generic Neural Architecture Search via Regression

2 code implementations NeurIPS 2021 Yuhong Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples.

 Ranked #1 on Neural Architecture Search on NAS-Bench-101 (Spearman Correlation metric)

Image Classification Neural Architecture Search +1

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

1 code implementation9 Jul 2021 Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen

We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency.

Adversarial Graph Augmentation to Improve Graph Contrastive Learning

1 code implementation NeurIPS 2021 Susheel Suresh, Pan Li, Cong Hao, Jennifer Neville

Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data.

Contrastive Learning Self-Supervised Learning

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

no code implementations11 May 2021 Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao

This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.

Model Compression Quantization

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

no code implementations8 Apr 2021 Cong Hao, Deming Chen

We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency.

Multi-Task Learning Sensor Fusion

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

no code implementations25 Mar 2021 Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.

Benchmarking Edge-computing

IronMan: GNN-assisted Design Space Exploration in High-Level Synthesis via Reinforcement Learning

no code implementations16 Feb 2021 Nan Wu, Yuan Xie, Cong Hao

Despite the great success of High-Level Synthesis (HLS) tools, we observe several unresolved challenges: 1) the high-level abstraction of programming styles in HLS sometimes conceals optimization opportunities; 2) existing HLS tools do not provide flexible trade-off (Pareto) solutions among different objectives and constraints; 3) the actual quality of the resulting RTL designs is hard to predict.

Reinforcement Learning (RL)

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

no code implementations ICLR 2021 Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Hardware Aware Neural Architecture Search Neural Architecture Search

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation1 Jan 2021 Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations14 Oct 2020 Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

1 code implementation18 May 2020 Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs.

Model Compression object-detection +2

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations6 May 2020 Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation6 Jan 2020 Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations18 Nov 2019 Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

1 code implementation25 Jun 2019 Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

object-detection Object Detection

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2 code implementations20 May 2019 Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

object-detection Object Detection

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

2 code implementations9 Apr 2019 Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

C++ code object-detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.