Search Results for author: Hongwu Peng

Found 18 papers, 6 papers with code

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

1 code implementation5 Feb 2024 Can Jin, Tong Che, Hongwu Peng, Yiyuan Li, Marco Pavone

The student learners are trained by the main model and improve the main model to capture more generalizable and teachable correlations by providing feedback.

Image Classification Language Modelling +1

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

no code implementations22 Jan 2024 Bingbing Li, Geng Yuan, Zigeng Wang, Shaoyi Huang, Hongwu Peng, Payman Behnam, Wujie Wen, Hang Liu, Caiwen Ding

Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

1 code implementation19 Jan 2024 Tianle Cai, Yuhong Li, Zhengyang Geng, Hongwu Peng, Jason D. Lee, Deming Chen, Tri Dao

We present two levels of fine-tuning procedures for Medusa to meet the needs of different use cases: Medusa-1: Medusa is directly fine-tuned on top of a frozen backbone LLM, enabling lossless inference acceleration.

MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training

1 code implementation14 Dec 2023 Hongwu Peng, Xi Xie, Kaustubh Shivdikar, MD Amit Hasan, Jiahui Zhao, Shaoyi Huang, Omer Khan, David Kaeli, Caiwen Ding

In this paper, we present MaxK-GNN, an advanced high-performance GPU training system integrating algorithm and system innovation.

Advanced Large Language Model (LLM)-Driven Verilog Development: Enhancing Power, Performance, and Area Optimization in Code Synthesis

no code implementations2 Dec 2023 Kiran Thorat, Jiahui Zhao, Yaotian Liu, Hongwu Peng, Xi Xie, Bin Lei, Jeff Zhang, Caiwen Ding

The increasing use of Advanced Language Models (ALMs) in diverse sectors, particularly due to their impressive capability to generate top-tier content following linguistic instructions, forms the core of this investigation.

Language Modelling Large Language Model

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

no code implementations8 Nov 2023 Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker, Ang Li

The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands.

Accel-GCN: High-Performance GPU Accelerator Design for Graph Convolution Networks

1 code implementation22 Aug 2023 Xi Xie, Hongwu Peng, Amit Hasan, Shaoyi Huang, Jiahui Zhao, Haowen Fang, Wei zhang, Tong Geng, Omer Khan, Caiwen Ding

Utilizing these principles, we formulated a kernel for sparse matrix multiplication (SpMM) in GCNs that employs block-level partitioning and combined warp strategy.

Computational Efficiency

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

no code implementations30 Nov 2022 Shaoyi Huang, Bowen Lei, Dongkuan Xu, Hongwu Peng, Yue Sun, Mimi Xie, Caiwen Ding

We further design an acquisition function and provide the theoretical guarantees for the proposed method and clarify its convergence property.

Towards Sparsification of Graph Neural Networks

1 code implementation11 Sep 2022 Hongwu Peng, Deniz Gurevin, Shaoyi Huang, Tong Geng, Weiwen Jiang, Omer Khan, Caiwen Ding

In this paper, we utilize two state-of-the-art model compression methods (1) train and prune and (2) sparse training for the sparsification of weight layers in GNNs.

Image Classification Link Prediction +4

Binary Complex Neural Network Acceleration on FPGA

no code implementations10 Aug 2021 Hongwu Peng, Shanglin Zhou, Scott Weitze, Jiaxin Li, Sahidul Islam, Tong Geng, Ang Li, Wei zhang, Minghu Song, Mimi Xie, Hang Liu, Caiwen Ding

Deep complex networks (DCN), in contrast, can learn from complex data, but have high computational costs; therefore, they cannot satisfy the instant decision-making requirements of many deployable systems dealing with short observations or short signal bursts.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.