The modelling dilemma in Machine Learning — Optimization vs Generalization

Nicolas Pogeant
4 min readFeb 12, 2022

--

Photo by Moses Londo on Unsplash

As a young Data Scientist, I always thought that the best model you could create was the model that would give the best metrics after predicting some test outputs.

However, aiming to have the lowest MSE or the highest Classification Accuracy will not be the real deal. But Why ?

Intuition —training importance

Photo by Michael Browning on Unsplash

Let’s think about a situation where two food critics go to the same Colombian establishment to eat the same dish.

The first critic is very respected, he knows very well the food in every continents, every countries and even, every regions of these countries. He trained and learned a lot by eating in the best places in the world and keeping all the flavors in mind.

The second critic is young, inexperienced, and has just started his career. He has been to a few restaurants, eaten a few dishes but is very confident in his choice and judgment.

After eating the dish, each critic gave a rating:

  • the 1st one gave 8 out of 10
  • the 2nd one gave 3 out of 10

It appears that the dish has been rated by many famous and distinguished critics before and the average score is 9.1 out of 10.

What is wrong ?

It is exactly the same issue as two Machine Learning models that would not be trained in the same way.

Indeed, the first critic knows how to evaluate a dish, with many characteristics like taste, cooking… He is able to use a kind of regression to “calculate” the note he will give because he has a long career and many dishes eaten:

The second does not know all this but has a characteristic that he uses to determine the grade, the country/region where the dish comes from:

Why did he give a 5 out of 10? Because he ate Colombian food twice in very bad restaurants in his hometown.

This story explains the difference between a generalized model and a kind of bad one.

Generalization refers to models that are able to nicely predict outcomes from unknown and new data. It is what every companies has to look for because these models are useful. Of course, the model has to be good during the training session but its complexity has to be low enough to stay general.

The first critic has an optimized model and a generalized one, he his able to use what he learned to correctly grade new dishes, even if his “prediction” is not matching the mean of all the grades before.

The second critic think using a optimized model too because of his experiences, but in fact, it is an overfitted model that was almost perfect during the training session and became terrible when it comes to predict new data.

An example of different models representation (from underfitting to overfitting), the blue line is the model’s predictions, we clearly see that the model in the middle is the best one :

scikit-learn library

The last one represents an overfitted model that know precisely the relations between inputs and outputs. However, it learned too well and is lost when it is confronted to new data (that is showed by the large curves he has). These type of models are considered with high variance in its predictions and sensible to noise within the data.
A model based on a small dataset will face this issue.

On the contrary, the fist plot is representing underfitting issue which refers to a model with high bias, meaning that it is ignoring the dataset and the relations between the inputs and the outputs.

As I said at the beginning, an accurate model is not the only thing to have in mind and to train. This is explained by what we just saw: overfitted models are models with high accuracy and low prediction errors. This defeats the real purpose of machine learning, which is generalization and obtaining a model that can be used in production.

The fundamental issue in machine learning is the tension between optimization and generalization (François Chollet, Deep Learning with Python).

This problem is well known when it comes to Deep Learning algorithms that build models by creating their own rules, rules that might be too specific and not generalized…

However, with great data, and some useful methods (regularization for example), it can be overcome !

Thank you for reading, I hope you enjoyed and learned new things !

--

--