Sign in

🚀 Machine Learning Engineer @ Cosmose AI| 👨‍🏫 Data Science Instructor @ DataCamp | 🌐

A practical overview with examples and Python code.

Probability distributions are mathematical functions describing probabilities of things happening. Many processes taking place in the world around us can be described by a handful of distributions that have been well-researched and analyzed. Getting one’s head around these few goes a long way towards being able to statistically model a range of phenomena. Let’s take a look at six useful probability distributions!


Or how to double your chances of winning a car by getting your probabilities right

The Monty Hall problem is a decades-old brain teaser that’s still confusing people today. It is loosely based on an old American TV game show and is named after its host, Monty Hall. At the final stage of the game, the contestants would face a choice in which, by choosing correctly, they could double their chance of winning a brand-new car. But guess what: most of them did not! Would you be wiser? Read on to find out!

On the relevance of the cornerstone of statistical inference for data scientists.

Central Limit Theorem, or CLT, is taught in every STATS101 class. A typical way of introducing this topic is by presenting the formulae, discussing the assumptions, and going through a couple of calculations involving the normal density function. What’s missing is CLT’s relevance for data scientists’ day-to-day work. Let me try to point it out.

A short primer on why can reject hypotheses, but cannot accept them, with examples and visuals.

Hypothesis testing is the basis of the classical statistical inference. It’s a framework for making decisions under uncertainty with the goal to prevent you from making stupid decisions — provided there is data to verify their stupidity. If there is no such data… ¯\_(ツ)_/¯

The goal of hypothesis testing is to prevent you from making stupid decisions — provided there is data to verify their stupidity.

The catch here is that you can only use hypothesis testing to dismiss a choice as a stupid…

A statistician’s perspective on how (not to) do it to keep your machine learning workflow unflawed.

Recently, I couldn't help but notice something alarming about the popular machine learning books. Even the best titles that do a great job explaining the algorithms and their applications, tend to neglect one important aspect. In cases where statistical rigor is needed to do things properly, they often suggest dangerously over-simplified solutions, causing severe headache to a statistician-by-training such as myself, and detrimentally impacting the machine learning workflow.

Even the best machine learning books tend to neglect topics in which statistical rigor is needed to do things properly, proposing dangerously over-simplified solutions instead.

A couple of weeks back, I have…

A statistician’s perspective on the types of variables, their meaning, and implications for machine learning.

I’ve been reading a popular book on machine learning recently. Once I reached the chapter on feature engineering, the author noted that, since most machine learning algorithms require numeric data as input, categorical variables need to be encoded as numeric ones. For instance, to paraphrase the example, we could encode a categorical variable education_level which takes the values: elementary, high_school, university, as numbers 1, 2, and 3, respectively. At that point, even though I’m an ML Engineer by trade, I heard the inner statistician-by-training within me cry out loud! Do people just run .fit_predict()

Hands-on Tutorials

Get to the neighborhood of optimal values quickly without costly searches.

The learning rate is arguably the most important hyperparameter to tune in a neural network. Unfortunately, it is also one of the hardest to tune properly. But don’t despair, for the Learning Rate Finder will get you to pretty decent values quickly! Let’s see how it works and how to implement it in TensorFlow.

Improve your neural network for free with one small trick, getting model uncertainty estimate as a bonus.

There ain’t no such thing as a free lunch, at least according to the popular adage. Well, not anymore! Not when it comes to neural networks, that is to say. Read on to see how to improve your network’s performance with an incredibly simple yet clever trick called the Monte Carlo Dropout.


The magic trick we are about to introduce only works if your neural network has dropout layers, so let’s kick off with briefly introducing these. Dropout boils down to simply switching-off some neurons at each training step. At each step, a different set of neurons are switched off…

How I made my Dockerfiles stop ignoring .dockerignores

I have been working on this project recently in which a couple of docker containers are built along the way and they end up being sent to different third-party servers. Due to privacy reasons, some specific files must not be sent to particular servers. Hence, each container has its own blacklist of files it should not accept inside. This should be handled by the .dockerignore files, except that my .dockerignores got, well, ignored (no pun intended).

It took me hours to find the solution, which, obviously, turned out to be a one-liner.

I hope I can save you some miserable…

An intuitive visual explanation

You may have heard about the so-called kernel trick, a maneuver that allows support vector machines, or SVMs, to work well with non-linear data. The idea is to map the data into a high-dimensional space in which it becomes linear and then apply a simple, linear SVM. Sounds sophisticated and to some extent it is. However, while it might be hard to understand how the kernels work, it is pretty easy to grasp what they are trying to achieve. Read on to see it for yourself!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store