no code implementations • 22 Apr 2024 • Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang
This paper presents a comprehensive survey of the existing literature on efficient LLM inference.
1 code implementation • 2 Apr 2024 • Enshu Liu, Junyi Zhu, Zinan Lin, Xuefei Ning, Matthew B. Blaschko, Sergey Yekhanin, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang
For example, LCSC achieves better performance using 1 number of function evaluation (NFE) than the base model with 2 NFE on consistency distillation, and decreases the NFE of DM from 15 to 9 while maintaining the generation quality on CIFAR-10.
no code implementations • 25 Mar 2024 • Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang
In recent years, there has been significant progress in the development of text-to-image generative models.
1 code implementation • 28 Feb 2024 • Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang
Post-training quantization (PTQ) has emerged as a promising technique to reduce the cost of large language models (LLMs).
1 code implementation • 6 Feb 2024 • Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang
In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation.
no code implementations • 8 Jan 2024 • Shulin Zeng, Jun Liu, Guohao Dai, Xinhao Yang, Tianyu Fu, Hongyi Wang, Wenheng Ma, Hanbo Sun, Shiyao Li, Zixiao Huang, Yadong Dai, Jintao Li, Zehao Wang, Ruoyu Zhang, Kairui Wen, Xuefei Ning, Yu Wang
However, existing GPU and transformer-based accelerators cannot efficiently process compressed LLMs, due to the following unresolved challenges: low computational efficiency, underutilized memory bandwidth, and large compilation overheads.
no code implementations • 12 Dec 2023 • Enshu Liu, Xuefei Ning, Huazhong Yang, Yu Wang
In this paper, we propose a unified sampling framework (USF) to study the optional strategies for solver.
no code implementations • 28 Nov 2023 • Lin Zhao, Hongxuan Li, Xuefei Ning, Xinru Jiang
Cross-modal Steganography is the practice of concealing secret signals in publicly available cover signals (distinct from the modality of the secret signals) unobtrusively.
1 code implementation • 28 Jul 2023 • Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, Yu Wang
This work aims at decreasing the end-to-end generation latency of large language models (LLMs).
no code implementations • ICCV 2023 • Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Yali Zhao, Linfeng Zhang, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang
One reason for this high resource consumption is the presence of a large number of redundant background points in Lidar point clouds, resulting in spatial redundancy in both 3D voxel and dense BEV map representations.
1 code implementation • 15 Jun 2023 • Enshu Liu, Xuefei Ning, Zinan Lin, Huazhong Yang, Yu Wang
Diffusion probabilistic models (DPMs) are a new class of generative models that have achieved state-of-the-art generation quality in various domains.
2 code implementations • NeurIPS 2023 • Zifu Wang, Xuefei Ning, Matthew B. Blaschko
To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels.
1 code implementation • 2 Feb 2023 • Junbo Zhao, Xuefei Ning, Enshu Liu, Binxin Ru, Zixuan Zhou, Tianchen Zhao, Chen Chen, Jiajin Zhang, Qingmin Liao, Yu Wang
In the first step, we train different sub-predictors on different types of available low-fidelity information to extract beneficial knowledge as low-fidelity experts.
1 code implementation • 16 Jul 2022 • Zixuan Zhou, Xuefei Ning, Yi Cai, Jiashu Han, Yiping Deng, Yuhan Dong, Huazhong Yang, Yu Wang
Specifically, we train the supernet with a large sharing extent (an easier curriculum) at the beginning and gradually decrease the sharing extent of the supernet (a harder curriculum).
no code implementations • 5 Apr 2022 • Cheng Liu, Zhen Gao, Siting Liu, Xuefei Ning, Huawei Li, Xiaowei Li
With the rapid advancements of deep learning in the past decade, it can be foreseen that deep learning will be continuously deployed in more and more safety-critical applications such as autonomous driving and robotics.
no code implementations • CVPR 2022 • Tianchen Zhao, Niansong Zhang, Xuefei Ning, He Wang, Li Yi, Yu Wang
We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers.
no code implementations • 12 Dec 2021 • Weilin Liu, Ye Mu, Chao Yu, Xuefei Ning, Zhong Cao, Yi Wu, Shuang Liang, Huazhong Yang, Yu Wang
These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.
no code implementations • 29 Sep 2021 • Nianhui Guo, Joseph Bethge, Haojin Yang, Kai Zhong, Xuefei Ning, Christoph Meinel, Yu Wang
Recent works on Binary Neural Networks (BNNs) have made promising progress in narrowing the accuracy gap of BNNs to their 32-bit counterparts, often based on specialized model designs using additional 32-bit components.
1 code implementation • 13 Jun 2021 • Nianhui Guo, Joseph Bethge, Haojin Yang, Kai Zhong, Xuefei Ning, Christoph Meinel, Yu Wang
Recent works on Binary Neural Networks (BNNs) have made promising progress in narrowing the accuracy gap of BNNs to their 32-bit counterparts.
no code implementations • AAAI Workshop AdvML 2022 • Yi Cai, Xuefei Ning, Huazhong Yang, Yu Wang
It provides high scalability because the paths within an EIO network exponentially increase with the network depth.
no code implementations • CVPR 2022 • Minxue Tang, Xuefei Ning, Yitu Wang, Jingwei Sun, Yu Wang, Hai Li, Yiran Chen
In this work, we propose FedCor -- an FL framework built on a correlation-based client selection strategy, to boost the convergence rate of FL.
1 code implementation • 10 Jan 2021 • Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong, Xuefei Ning, Yuzhe ma, HaoYu Yang, Bei Yu, Huazhong Yang, Yu Wang
With the down-scaling of CMOS technology, the design complexity of very large-scale integrated (VLSI) is increasing.
no code implementations • 1 Jan 2021 • Kai Zhong, Xuefei Ning, Tianchen Zhao, Zhenhua Zhu, Shulin Zeng, Guohao Dai, Yu Wang, Huazhong Yang
Through this dynamic precision framework, we can reduce the bit-width of convolution, which is the most computational cost, while keeping the training process close to the full precision floating-point training.
1 code implementation • 22 Dec 2020 • Xuefei Ning, Junbo Zhao, Wenshuo Li, Tianchen Zhao, Yin Zheng, Huazhong Yang, Yu Wang
In this paper, considering scenarios with capacity budget, we aim to discover adversarially robust architecture at targeted capacities.
1 code implementation • 25 Nov 2020 • Xuefei Ning, Changcheng Tang, Wenshuo Li, Songyi Yang, Tianchen Zhao, Niansong Zhang, Tianyi Lu, Shuang Liang, Huazhong Yang, Yu Wang
Neural Architecture Search (NAS) has received extensive attention due to its capability to discover neural network architectures in an automated manner.
no code implementations • 21 Nov 2020 • Tianchen Zhao, Xuefei Ning, Xiangsheng Shi, Songyi Yang, Shuang Liang, Peng Lei, Jianfei Chen, Huazhong Yang, Yu Wang
We also design the micro-level search space to strengthen the information flow for BNN.
no code implementations • 28 Sep 2020 • Xuefei Ning, Wenshuo Li, Zixuan Zhou, Tianchen Zhao, Shuang Liang, Yin Zheng, Huazhong Yang, Yu Wang
A major challenge in NAS is to conduct a fast and accurate evaluation of neural architectures.
1 code implementation • NeurIPS 2021 • Xuefei Ning, Changcheng Tang, Wenshuo Li, Zixuan Zhou, Shuang Liang, Huazhong Yang, Yu Wang
Conducting efficient performance estimations of neural architectures is a major challenge in neural architecture search (NAS).
no code implementations • 31 Jul 2020 • Tong Wu, Xuefei Ning, Wenshuo Li, Ranran Huang, Huazhong Yang, Yu Wang
In this paper, we tackle the issue of physical adversarial examples for object detectors in the wild.
no code implementations • 4 Jun 2020 • Kai Zhong, Xuefei Ning, Guohao Dai, Zhenhua Zhu, Tianchen Zhao, Shulin Zeng, Yu Wang, Huazhong Yang
For training a variety of models on CIFAR-10, using 1-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within $1\%$.
1 code implementation • ECCV 2020 • Xuefei Ning, Tianchen Zhao, Wenshuo Li, Peng Lei, Yu Wang, Huazhong Yang
In budgeted pruning, how to distribute the resources across layers (i. e., sparsity allocation) is the key problem.
1 code implementation • ECCV 2020 • Xuefei Ning, Yin Zheng, Tianchen Zhao, Yu Wang, Huazhong Yang
Experimental results on various search spaces confirm GATES's effectiveness in improving the performance predictor.
no code implementations • 20 Mar 2020 • Xuefei Ning, Guangjun Ge, Wenshuo Li, Zhenhua Zhu, Yin Zheng, Xiaoming Chen, Zhen Gao, Yu Wang, Huazhong Yang
By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.
no code implementations • 18 Jun 2018 • Xuefei Ning, Yin Zheng, Zhuxi Jiang, Yu Wang, Huazhong Yang, Junzhou Huang
Moreover, we also propose HiTM-VAE, where the document-specific topic distributions are generated in a hierarchical manner.
no code implementations • 14 May 2018 • Wenshuo Li, Jincheng Yu, Xuefei Ning, Pengjun Wang, Qi Wei, Yu Wang, Huazhong Yang
So, in this paper, we propose a hardware-software collaborative attack framework to inject hidden neural network Trojans, which works as a back-door without requiring manipulating input images and is flexible for different scenarios.
no code implementations • ICLR 2018 • Xuefei Ning, Yin Zheng, Zhuxi Jiang, Yu Wang, Huazhong Yang, Junzhou Huang
On the other hand, different with the other BNP topic models, the inference of iTM-VAE is modeled by neural networks, which has rich representation capacity and can be computed in a simple feed-forward manner.