Getting machine learning models from research notebooks to production has emerged as a distinct engineering discipline in 2020. MLOps applies DevOps principles to the ML lifecycle: training, evaluation, deployment, monitoring, and retraining.

The training pipeline

Production ML starts with a reproducible training pipeline: data versioning (DVC, Delta Lake, MLflow), experiment tracking (MLflow, Weights and Biases), and automated retraining triggers. The research notebook is not a training pipeline. A reproducible pipeline allows regenerating any model version from a specific data version and code version, essential for debugging production model behaviour and for regulatory audit.

Model serving patterns

ML models can be served as: REST APIs (Flask, FastAPI, TorchServe, TensorFlow Serving), embedded in the application (ONNX Runtime in the application process), or as batch inference pipelines. Real-time serving (REST API) provides low latency for online inference; batch serving processes large volumes of records offline. The serving pattern should match the latency and throughput requirements: real-time for user-facing features, batch for large-scale scoring pipelines.

Model monitoring

Models degrade in production due to data drift (the statistical distribution of input data changes from the training distribution) and concept drift (the relationship between inputs and outputs changes). Production ML requires monitoring: input feature distributions versus training distributions, model prediction distributions (are predictions shifting over time?), and business metrics tied to model outputs. Detecting drift early enables retraining before model performance degrades below acceptable levels.

Azure Machine Learning for MLOps

Azure Machine Learning provides: experiment tracking, dataset versioning, model registry, managed endpoints for model serving, and ML pipelines for training automation. The ML pipeline component creates DAG-based training workflows that can be triggered manually or on a schedule. AML's managed endpoints provide autoscaling inference with blue/green deployment for model updates.