no code implementations • 27 Feb 2024 • Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg
We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing.
no code implementations • 17 Dec 2023 • Daniel Gerbi Duguma, Juliana Zhang, Meysam Aboutalebi, Shiliang Zhang, Catherine Banet, Cato Bjørkli, Chinmayi Baramashetru, Frank Eliassen, HUI ZHANG, Jonathan Muringani, Josef Noll, Knut Inge Fostervold, Lars Böcker, Lee Andrew Bygrave, Matin Bagherpour, Maunya Doroudi Moghadam, Olaf Owe, Poushali Sengupta, Roman Vitenberg, Sabita Maharjan, Thiago Garrett, Yushuai Li, Zhengyu Shan
This manuscript aims to formalize and conclude the discussions initiated during the PriTEM workshop 22-23 March 2023.
no code implementations • 30 Nov 2023 • Thiago Garrett, Weijia Song, Roman Vitenberg, Ken Birman
ML inference workflows often require low latency and high throughput, yet we lack good options for addressing this need.
1 code implementation • 29 Nov 2023 • Weijia Song, Thiago Garrett, Yuting Yang, Mingzhao Liu, Edward Tremel, Lorenzo Rosa, Andrea Merlina, Roman Vitenberg, Ken Birman
Interactive intelligent computing applications are increasingly prevalent, creating a need for AI/ML platforms optimized to reduce per-event latency while maintaining high throughput and efficient resource management.