ML Model Deployment

ML in production vs. ML in research

These are important for ML in production than in research:

Model Serving

Deployment procedures

Deployment approaches

Batch Prediction Versus Online Prediction

What are they

Batch prediction (asynchronous) Online prediction (synchronous)
Frequency Periodical, such as every four hours As soon as requests come
Useful for Processing accumulated data when you don’t need immediate results (such as recommender systems) When predictions are needed as soon as a data sample is generated (such as fraud detection)
Optimized for High throughput Low latency
Cost Pay for resources while endpoint is running Pay for resources for the duration of the batch job

Unifying Batch Pipeline and Streaming Pipeline

Nowadays, the two pipelines can be unified by having two teams: the ML team maintains the batch pipeline for training while the deployment team maintains the stream pipeline for inference
Pasted image 20230711142806.png|400

How to accelerate ML model inference

There are three main approaches to reduce its inference latency:

Example Workflow for Deploying a Production Machine Learning Model

  1. Set Up a Secure AWS Environment
    • Tools: AWS Identity and Access Management (IAM), Amazon SageMaker
    • Define and manage security using IAM policies.
  2. Data Analysis and Preparation
  3. Model Training and Hyperparameter Tuning
    • Tools: Amazon SageMaker, GPU instances
    • Set up hyperparameter tuning and perform multi-GPU instance training.
  4. Model Evaluation
  5. Deploy Model to AWS
    • Tools: AWS API Gateway, AWS Lambda (AWS for Data Science#^94aba2)
    • Deploy the trained model using AWS API Gateway and Lambda functions, ensuring access via REST API.
  6. Test the API
    • Tools: Postman
    • Verify API functionality by sending test requests.
  7. Secure and Optimize Deployment
    • Tools: AWS API Gateway, Lambda, AutoScaling
    • Secure API endpoints (IP whitelisting) and set up auto-scaling to handle traffic efficiently.
  8. Build and Deploy Web Application
    • Tools: React.js, Node.js, Express.js, MongoDB
    • Develop a MERN (MongoDB, Express, React, Node.js) web app that interacts with the AWS API.
  9. Host the Application
    • Tools: DigitalOcean
    • Deploy the web app to a cloud service like DigitalOcean.
  10. Monitor and Maintain
    • Tools: Amazon CloudWatch
    • Monitor performance with CloudWatch logs and manage Lambda concurrency for optimization.