Member-only story

MLOps

How to Detect Data Drift with Hypothesis Testing

Hint: forget about the p-values

Michał Oleszak

Published in

TDS Archive

18 min readMay 17, 2023

Data drift is a concern to anyone with a machine learning model serving live predictions. The world changes, and as the consumers’ tastes or demographics shift, the model starts receiving feature values different from what it has seen in training, which may result in unexpected outputs. Detecting feature drift appears to be simple: we just need to decide whether the training and serving distributions of the feature in question are the same or not. There are statistical tests for this, right? Well, there are, but are you sure you are using them correctly?

Univariate drift detection

Monitoring the post-deployment performance of a machine learning model is a crucial part of its life cycle. As the world changes and the data drifts, many models tend to show diminishing performance over time. The best approach to staying alert is to calculate the performance metrics in real time or to estimate them when the ground truth is not available.

A likely cause of an observed degraded performance is data drift. Data drift is a change in the distribution of the model’s inputs between training and production data. Detecting and analyzing the nature of data drift can help to bring a degraded model…

MLOps

How to Detect Data Drift with Hypothesis Testing

Hint: forget about the p-values

Univariate drift detection

Create an account to read the full story.

Published in TDS Archive

Written by Michał Oleszak

Responses (1)

More from Michał Oleszak and TDS Archive

What goes into bronze, silver, and gold layers of a medallion data architecture?

Here’s a four-layer medallion architecture that explicitly addresses data governance and separation-of-responsibility

Expectations vs. Reality: Sad State of Affairs for Data Sciences

LinkedIn and other social media sites have made Data Science an undisputed winner of the Technology Beauty Pageant Contest. Internet is…

Breaking Boundaries: How Parallel Processing Drives Artificial Intelligence Innovation

Parallel Processing is a term used for the simultaneous execution of multiple tasks or computations using multiple processors or computing…

Thru-hiking the Enchantments, Washington

A step-by-step guide to preparing for, and doing the hike.

Recommended from Medium

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Jeff Bezos Says the 1-Hour Rule Makes Him Smarter. New Neuroscience Says He’s Right

Jeff Bezos’s morning routine has long included the one-hour rule. New neuroscience says yours probably should too.

Lists

Staff picks

Stories to Help You Level-Up at Work

Self-Improvement 101

Productivity 101

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

6 AI Agents That Are So Good, They Feel Illegal

AI agents are the future because they can replace all the manual work with automation with 100% accuracy and fast speed.

Stop Copy-Pasting. Turn PDFs into Data in Seconds

Automate PDF extraction and get structured data instantly with Python’s best tools

15 AI Agent Business Ideas to Get Rich in 2025