Search Results for author: Junchen Jiang

Found 13 papers, 2 papers with code

Large Language Model Adaptation for Networking

no code implementations • 4 Feb 2024 • Duo Wu, Xianda Wang, Yaqi Qiao, Zhi Wang, Junchen Jiang, Shuguang Cui, Fangxin Wang

In this paper, we present NetLLM, the first LLM adaptation framework that efficiently adapts LLMs to solve networking problems.

Answer Generation Language Modelling +3

Paper
Add Code

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

no code implementations • 23 Jan 2024 • Hanchen Li, YuHan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang

To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming.

Chatbot

Paper
Add Code

CacheGen: KV Cache Compression and Streaming for Fast Language Model Serving

1 code implementation • 11 Oct 2023 • YuHan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, YuYang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang

Compared to the recent systems that reuse the KV cache, CacheGen reduces the KV cache size by 3. 7-4. 3x and the total delay in fetching and processing contexts by 2. 7-3. 2x while having negligible impact on the LLM response quality in accuracy or perplexity.

Language Modelling Quantization

Paper
Code

Automatic and Efficient Customization of Neural Networks for ML Applications

no code implementations • 7 Oct 2023 • YuHan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire

ML APIs have greatly relieved application developers of the burden to design and train their own neural network models -- classifying objects in an image can now be as simple as one line of Python code to call an API.

Paper
Add Code

OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation

no code implementations • 3 Oct 2023 • Kuntai Du, YuHan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, YuYang Huang, Ganesh Ananthanarayanan, Junchen Jiang

While the high demand for network bandwidth and GPU resources could be substantially reduced by optimally adapting the configuration knobs, such as video resolution and frame rate, current adaptation techniques fail to meet three requirements simultaneously: adapt configurations (i) with minimum extra GPU or bandwidth overhead; (ii) to reach near-optimal decisions based on how the data affects the final DNN's accuracy, and (iii) do so for a range of configuration knobs.

object-detection Object Detection

Paper
Add Code

GRACE: Loss-Resilient Real-Time Video through Neural Codecs

no code implementations • 21 May 2023 • Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang, YuHan Liu, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang

In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements.

Paper
Add Code

AccMPEG: Optimizing Video Encoding for Video Analytics

no code implementations • 26 Apr 2022 • Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang

This paper presents AccMPEG, a new video encoding and streaming system that meets all the three requirements.

object-detection Object Detection +1

Paper
Add Code

Sayer: Using Implicit Feedback to Optimize System Policies

no code implementations • 28 Oct 2021 • Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins

We develop a methodology, called Sayer, that leverages implicit feedback to evaluate and train new system policies.

counterfactual Data Augmentation

Paper
Add Code

Ekya: Continuous Learning of Video Analytics Models on Edge Compute Servers

no code implementations • 19 Dec 2020 • Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Nikolaos Karianakis, Yuanchao Shu, Kevin Hsieh, Victor Bahl, Ion Stoica

Compressed models that are deployed on the edge servers for inference suffer from data drift, where the live video data diverges from the training data.

Paper
Add Code

Domain-specific Communication Optimization for Distributed DNN Training

no code implementations • 16 Aug 2020 • Hao Wang, Jingrong Chen, Xinchen Wan, Han Tian, Jiacheng Xia, Gaoxiong Zeng, Weiyan Wang, Kai Chen, Wei Bai, Junchen Jiang

Communication overhead poses an important obstacle to distributed DNN training and draws increasing attention in recent years.

Scheduling

Paper
Add Code

ReXCam: Resource-Efficient, Cross-Camera Video Analytics at Scale

1 code implementation • 3 Nov 2018 • Samvit Jain, Xun Zhang, Yuhao Zhou, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Joseph Gonzalez

Enterprises are increasingly deploying large camera networks for video analytics.

Paper
Code

Addressing Training Bias via Automated Image Annotation

no code implementations • 22 Sep 2018 • Zhujun Xiao, Yanzi Zhu, Yuxin Chen, Ben Y. Zhao, Junchen Jiang, Hai-Tao Zheng

Build accurate DNN models requires training on large labeled, context specific datasets, especially those matching the target scenario.

Paper
Add Code

Scaling Video Analytics Systems to Large Camera Deployments

no code implementations • 7 Sep 2018 • Samvit Jain, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Joseph E. Gonzalez

Driven by advances in computer vision and the falling costs of camera hardware, organizations are deploying video cameras en masse for the spatial monitoring of their physical premises.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.