Search Results for author: Yunseong Kim

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

However, maximizing server utilization and system throughput is also crucial for ML service providers as it helps lower the total-cost-of-ownership.

Paper
Add Code

In cloud ML inference systems, batching is an essential technique to increase throughput which helps optimize total-cost-of-ownership.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.