MLOps

AI Model Optimization Has Never Been Easier

Compress the model for your edge device without losing the accuracy

Michał Oleszak
8 min readNov 2, 2022

--

As the popular cliché has it, data scientists spend 80% of their time preparing the data, and only 20% developing the models. While there might be some truth to it, we should never underestimate the effort needed for the remaining 20%. Choosing the architecture, training, fine-tuning, and evaluating the model is no mean feat, especially when developing models for edge devices, where criteria other than performance metrics need to be considered. I’ve recently got to use NetsPresso, a platform that promises to take care of all the model optimization in an automated manner. Let me show you how it works.

Optimizing machine learning models

The typical machine learning pipeline is becoming a more or less established process these days. We query or download the raw data, parse it and clean it, and extract and engineer features, to finally obtain a data set ready for training. Then, we iterate over model architectures and a multitude of training and data processing hyperparameters to hopefully arrive at a model that satisfies some relevant performance metrics.

But what if the model is destined to run on edge devices? In this case, performance metrics aren’t the only criteria for model selection. We should also pay attention to the latency, memory footprint, and power consumption, as measured on the particular device we are interested in. After all, our cool AI-powered app won’t bring any value to a user whose mobile phone dies due to insufficient memory or battery drainage.

When building models for edge devices, latency, memory footprint, and power consumption are important criteria for model selection.

However, as we compress our models to be faster and lighter, we would like to sacrifice as little accuracy as possible, if any. I refer to the process of finding the balance between machine learning performance (accuracy metrics) and computational performance (latency, memory usage) as model optimization. Let’s see how to optimize our models with NetsPresso.

--

--