Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures

NAACL 2018  ·  Zhucheng Tu, Mengping Li, Jimmy Lin ·

We demonstrate the serverless deployment of neural networks for model inferencing in NLP applications using Amazon{'}s Lambda service for feedforward evaluation and DynamoDB for storing word embeddings. Our architecture realizes a pay-per-request pricing model, requiring zero ongoing costs for maintaining server instances. All virtual machine management is handled behind the scenes by the cloud provider without any direct developer intervention. We describe a number of techniques that allow efficient use of serverless resources, and evaluations confirm that our design is both scalable and inexpensive.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here