no code implementations • 6 Feb 2024 • Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster
We identify that on the same task, inference time across models can differ by 2. 7x-136. 3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time!