Machine Learning Systems Design

Note

Most of the notes are from the book Machine Learning Systems Design

Overview of Machine Learning Systems Design

Pasted image 20230622120707.png|500

MLOps

MLOps = Machine Learning Operations

ML in research vs. ML in production
These are important for ML in production than in research:

Requirements for ML Systems

Reliability

Scalability

Maintainability

Adaptability

Iterative Process of Developing an ML system

  1. Project scoping
  2. Data engineering
  3. ML model development
  4. ML Model Deployment
  5. ML System Monitoring and Continual learning
  6. Business analysis

Infrastructure and Tooling for MLOps

Infrastructure

Pasted image 20230714122416.png|400

Storage & Compute

Development Environment

Resource Management

Pipeline Orchestration

orchestration allows managing end to end traceability of pipeline using automation to capture specific inputs, outputs, and artifacts of a given task.

Coping with ML training challenges

Checkpointing

Distributed training strategies

-> scale challenges: increased training data volume or increased model size and complexity

Model Integration

= integrating models with ML applications

Human-in-the-Loop Pipelines

Human review of model predictions

Pasted image 20231004110235.png|500