I am going to clearly explain the fundamentals of bias and variance. Imagine we measured the weight and height of a bunch of mice and plotted the data on a graph. Light mice tend to be short and heavier mice tend to be taller. But after a certain weight mouse does not get any taller just more obese.
Given this data, we would like to predict Mouse height given its weight. For example, if you told me your mouse weighed how much. Then we might predict that the mouse is how much tall. Ideally, we would know the exact mathematical formula that describes the relationship between weight and height.
We don’t know the formula. So, we are going to use 2 machine learning methods to approximate this relationship.
However, I will leave the true relationship curve in the figure for reference. The first thing we do is split the data into two sets one for training the machine learning algorithms and one for testing them.
The blue dots are the training set and the green dots are the testing set.
Here is just the training set. The first machine learning algorithm that we will use is linear regression least-squares method. Linear regression, it is a straight line to the training set.
Note the straight line does not have the flexibility to accurately replicate the arc in the true relationship. No matter how we try to fit the line. It will never curve. Thus, the straight line will never capture. The true relationship between weight and height no matter how well we fit it to the training set.
The inability of a machine learning method like linear regression to capture. The true relationship is called bias. Because the straight line can not be curved like the true relationship. It has a relatively large amount of bias.
Another machine learning method might fit a squiggly line to the training set. The squiggly line is super flexible and hugs the training set along the arc of the true relationship.
Because the squiggly line can handle the arc in the true relationship between weight and height. It has very little bias.
We can compare how well the straight line and the squiggly line fit the training set by calculating their sums of squares in. Other words we measure the distances from the fit lines to the data square them and add them up. Just they are squared so that negative distances. Do not cancel out positive distances.
Notice how the squiggly line fits the data so well that the distances between the line and the data are all zero. The contest to see whether the straight-line fits the training set better than the squiggly line. The squiggly line wins.
But remember so far, we have only calculated the sums of squares for the training set. We also have a testing set. Now, let us calculate the sums of squares for the testing set in. The contest to see whether the straight-line fits the testing set better than the squiggly line.
The straight line wins. Even though the squiggly line did a great job fitting the training set. It did a terrible job fitting the testing sets.
Machine learning language, the difference in fits between data sets is called variance. The squiggly line has a low bias since it is flexible and can adapt to the curve in the relationship between weight and height.
But the squiggly line has high variability because it results in vastly different sums of squares for different data sets in. Other words it is hard to predict how well the squiggly line will perform with future data sets. It might do well sometimes and other times it might do terribly.
In contrast, the straight line has a relatively high bias since it cannot capture the curve in the relationship between weight and height. But the straight line has relatively low variance because the sums of squares are very similar for different data sets.
In other words, the straight line might only give good predictions and not great predictions. But they will be consistently good predictions.
Because the squiggly line fits the training set really well, but not the testing set. We say that the squiggly line is overfitted. Machine learning the ideal algorithm has low bias and can accurately model the true relationship.
It has low variability by producing consistent predictions across different data sets. This is done by finding the sweet spot between a simple model and a complex model.
Conclusion – Fundamentals of Bias and Variance
3 commonly used methods for finding the sweet spot between simple and complicated models our regularization, boosting and bagging. I have clearly explained the fundamentals of bias and variance. Feel free to give your suggestion on this article.