Member-only story

MLOps

How to Detect Data Drift with Hypothesis Testing

Hint: forget about the p-values

Michał Oleszak
TDS Archive
Published in
18 min readMay 17, 2023

Data drift is a concern to anyone with a machine learning model serving live predictions. The world changes, and as the consumers’ tastes or demographics shift, the model starts receiving feature values different from what it has seen in training, which may result in unexpected outputs. Detecting feature drift appears to be simple: we just need to decide whether the training and serving distributions of the feature in question are the same or not. There are statistical tests for this, right? Well, there are, but are you sure you are using them correctly?

Univariate drift detection

Monitoring the post-deployment performance of a machine learning model is a crucial part of its life cycle. As the world changes and the data drifts, many models tend to show diminishing performance over time. The best approach to staying alert is to calculate the performance metrics in real time or to estimate them when the ground truth is not available.

A likely cause of an observed degraded performance is data drift. Data drift is a change in the distribution of the model’s inputs between training and production data. Detecting and analyzing the nature of data drift can help to bring a degraded model…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Michał Oleszak
Michał Oleszak

Written by Michał Oleszak

ML Engineer & Manager | Top Writer in AI & Statistics | michaloleszak.com | Book 1:1 @ stan.store/michaloleszak

Responses (1)

Write a response