Is this email not displaying correctly? View it in your browser.
Train in Data, learn machine learning online

SMOTE isn’t the Only Method That Distorts Probability Calibration

Image description

Welcome to Data Bites!



Every Monday, I’ll drop a no-fluff, straight-to-the-point tip on a data science skill, tool, or
method to help you stay sharp in the field. I hope you find it useful!

Share your thoughts on Linkedin!

If you’re currently taking our Forecasting with Machine Learning or Feature Engineering for Time Series Forecasting, share your thoughts on LinkedIn!


A short post about what you’ve learned or how you’re applying forecasting in your work, can help others find their way. And we'll thank you with a 30% Off!

Image description

Please tag Kishan and me so we can connect.


As a thank you, I’ll send you a DM (direct message) with a 30% discount towards your next course, book, or specialisation with us.


Thanks for being part of this journey!

Post on LinkedIn & Get 30% Off

SMOTE isn’t the Only Method That Distorts Probability Calibration

Lately, we hear a lot that oversampling and SMOTE distort probability distributions.


👉🏻 And it is correct. But those are not the only methods that affect calibration.


👉🏻 Cost sensitive learning also does.


👉🏻 Undersampling also affects probability distributions.


👉🏻 And in fact, many classifiers, like random forests, naive Bayes and GBMs, also return uncalibrated probabilities.


Why does it matter? Because calibrated probabilities inform the confidence around the prediction.


➡️ A well-calibrated classifier will correctly estimate the probability of an event occurring.


➡️ What this means is that, for example, if a fraud classifier outputs 0.8, that observation has an 80% chance of being fraudulent, or in other words, that 80 of 100 observations with similar probability will be indeed fraudulent.


🤔 I think people like repeating that SMOTE returns uncalibrated probabilities, because there are 1-2 recent articles that mention that the probabilities returned by classifiers trained using SMOTE are uncalibrated beyond repair. Or in other words, that you can’t recalibrate a classifier if you trained it with SMOTE.


Anyhow, what I wanted to discuss is that we can recalibrate uncalibrated probabilities.


There are various methods. The 2 implemented in scikit-learn are Platt-scaling and isotonic regression.


If you want to learn more about probability distribution and how to recalibrate classifiers, check out my course Machine Learning with Imbalanced Data.


I hope this information was useful!



Wishing you a successful week ahead - see you next Monday! 👋🏻


Sole

Ready to enhance your skills?

Our specializations, courses and books are here to assist you:

More courses

Did someone share this email with you? Think it's pretty cool? Then just hit the button and subscribe to Data Bites. Don’t miss out on any of our tips and propel your data science career to new heights.

Subscribe
Image description

Hi…I’m Sole



The main instructor at Train in Data. My work as a data scientist, includes creating and implementing machine learning models for evaluating insurance claims, managing credit risk, and detecting fraud. In 2018, I was honoured with a Data Science Leaders' award, and in 2019 and again in 2024, I was acknowledged as one of LinkedIn's voices in data science and analytics.

View

You are receiving this email because you subscribed to our newsletter, signed up on our website, purchased or downloaded any products from us.

Follow us on social media

Copyright (C) 2025 Train in Data. All rights reserved.

If you would like to unsubscribe, please click here.