Designing Machine Learning Systems an iterative process for production-ready applications

Many tutorials show you how to develop ML systems from ideation to deployed models. But with constant changes in tooling, those systems can quickly become outdated. Without an intentional design to hold the components together, these systems will become a technical liability, prone to errors and be...

Full description

Bibliographic Details
Main Author: Huyen, Chip
Format: eBook
Language:English
Published: O'Reilly Media, Inc. 2022
Edition:1st edition
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Chapter 6. Model Development and Offline Evaluation
  • Model Development and Training
  • Evaluating ML Models
  • Ensembles
  • Experiment Tracking and Versioning
  • Distributed Training
  • AutoML
  • Model Offline Evaluation
  • Baselines
  • Evaluation Methods
  • Summary
  • Chapter 7. Model Deployment and Prediction Service
  • Machine Learning Deployment Myths
  • Myth 1: You Only Deploy One or Two ML Models at a Time
  • Myth 2: If We Don't Do Anything, Model Performance Remains the Same
  • Myth 3: You Won't Need to Update Your Models as Much
  • Chapter 2. Introduction to Machine Learning Systems Design
  • Business and ML Objectives
  • Requirements for ML Systems
  • Reliability
  • Scalability
  • Maintainability
  • Adaptability
  • Iterative Process
  • Framing ML Problems
  • Types of ML Tasks
  • Objective Functions
  • Mind Versus Data
  • Summary
  • Chapter 3. Data Engineering Fundamentals
  • Data Sources
  • Data Formats
  • JSON
  • Row-Major Versus Column-Major Format
  • Text Versus Binary Format
  • Data Models
  • Relational Model
  • NoSQL
  • Structured Versus Unstructured Data
  • Data Storage Engines and Processing
  • Transactional and Analytical Processing
  • ETL: Extract, Transform, and Load
  • Modes of Dataflow
  • Data Passing Through Databases
  • Data Passing Through Services
  • Data Passing Through Real-Time Transport
  • Batch Processing Versus Stream Processing
  • Summary
  • Chapter 4. Training Data
  • Sampling
  • Nonprobability Sampling
  • Simple Random Sampling
  • Stratified Sampling
  • Weighted Sampling
  • Reservoir Sampling
  • Importance Sampling
  • Labeling
  • Hand Labels
  • Natural Labels
  • Handling the Lack of Labels
  • Class Imbalance
  • Challenges of Class Imbalance
  • Handling Class Imbalance
  • Cover
  • Copyright
  • Table of Contents
  • Preface
  • Who This Book Is For
  • What This Book Is Not
  • Navigating This Book
  • GitHub Repository and Community
  • Conventions Used in This Book
  • Using Code Examples
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • Chapter 1. Overview of Machine Learning Systems
  • When to Use Machine Learning
  • Machine Learning Use Cases
  • Understanding Machine Learning Systems
  • Machine Learning in Research Versus in Production
  • Machine Learning Systems Versus Traditional Software
  • Summary
  • Data Augmentation
  • Simple Label-Preserving Transformations
  • Perturbation
  • Data Synthesis
  • Summary
  • Chapter 5. Feature Engineering
  • Learned Features Versus Engineered Features
  • Common Feature Engineering Operations
  • Handling Missing Values
  • Scaling
  • Discretization
  • Encoding Categorical Features
  • Feature Crossing
  • Discrete and Continuous Positional Embeddings
  • Data Leakage
  • Common Causes for Data Leakage
  • Detecting Data Leakage
  • Engineering Good Features
  • Feature Importance
  • Feature Generalization
  • Summary