Search Results for author: Ke Hong

Found 4 papers, 1 papers with code

A Survey on Efficient Inference for Large Language Models

no code implementations • 22 Apr 2024 • Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

This paper presents a comprehensive survey of the existing literature on efficient LLM inference.

Paper
Add Code

FlashDecoding++: Faster Large Language Model Inference on GPUs

no code implementations • 2 Nov 2023 • Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong, Yu Wang

A single and static dataflow may lead to a 50. 25% performance loss for GEMMs of different shapes in LLM inference.

Language Modelling Large Language Model

Paper
Add Code

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

1 code implementation • 25 Oct 2023 • Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.

Autonomous Driving Recommendation Systems

1,114

Paper
Code

Ada3D : Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

no code implementations • ICCV 2023 • Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.