Navigating the Balance of Bias and Variance in Machine Learning

The Challenge of Bias and Variance

Many challenges with both regression and classification algorithms revolve around the interplay of “bias and variance” and the issues of “overfitting and underfitting.” These concepts capture the fundamental trade-offs in building models that generalize well to new, unseen data. Managing bias and variance effectively helps strike a balance, and avoiding overfitting or underfitting is crucial for creating models that accurately capture the underlying patterns in the data. In this article, we will touch upon bias and variance.

Bias in Machine Learning: The Dartboard Analogy

What is Bias?
- Imagine you have a target to hit, and you keep throwing darts. If you consistently throw the darts off to one side but still close to each other, you have a bias.
- Bias is like the consistent error in your model. If your model has high bias, it means it consistently misses the mark, perhaps oversimplifying the problem.
Characteristics:
- High bias models tend to underfit the training data.
- They may fail to capture the complexity of the true underlying patterns in the data.
- Common sources of bias include overly simple model architectures or insufficient features.
Effect on Performance:
- High bias can result in poor performance on both the training and test data.
- The model is not flexible enough to represent the underlying patterns, leading to consistently inaccurate predictions.

Variance in Machine Learning: The Dart Scatter

What is Variance?
- Now, let’s say you throw darts, but they are all over the place. Some are far to the left, some to the right, and there’s a lot of spread. That’s variance.
- Variance is like the model’s sensitivity to the training data. If your model has high variance, it’s like it’s paying too much attention to the specific details of the training data, even the noise or random fluctuations.
Characteristics:
- High variance models tend to overfit the training data.
- They are overly sensitive to the specific training examples and fail to generalize well to new, unseen data.
- Complex model architectures and excessive use of features can contribute to high variance.
Effect on Performance:
- High variance can result in excellent performance on the training data but poor generalization to new data.
- The model may perform well on specific training examples but fail to capture the broader patterns in the data.

The Bias-Variance Trade-Off: Finding the Sweet Spot

Ideally, you want your darts (predictions) to be close to the target (the actual outcome). So, you want to balance bias and variance. Too much bias, and you consistently miss in the same way; too much variance, and your predictions are all over the place.
Striking the right balance means your model generalizes well. It doesn’t oversimplify (bias) or overcomplicate (variance) the problem. It hits the target, even with new, unseen data.

Visualizing Bias vs Variance

*bias vs variance at various model complexity degrees*

In the context of the bias-variance tradeoff, the true function can be seen as the ideal or target function that you aim to approximate with your model. So you can think of the true function as a reference for the bias, and the difference between the true function and the predicted function as a representation of the variance.

In the provided graph, the true function is plotted in green, and the predicted functions for different polynomial degrees are shown in red. The difference between the true function and the predicted functions can give you a sense of how well your model captures the underlying pattern in the data and how much it deviates due to noise and overfitting.

Observations:

Degree = 1:

The model is a simple linear regression (degree = 1).
Bias is relatively high because the linear model cannot capture the sinusoidal pattern of the true function.
Variance is relatively low as the predicted function doesn’t deviate much across different training datasets.

Degree = 2:

The model is a quadratic regression (degree = 2).
Bias decreases compared to degree 1 as the model becomes more flexible.
Variance slightly increases, and the predicted function better fits the data, capturing some curvature.

Degree = 3:

The model is a cubic regression (degree = 3).
Bias continues to decrease, and the model captures more of the underlying sinusoidal pattern.
Variance increases as the model becomes more complex, fitting the training data closely.

Degree = 13:

The model is a polynomial regression of high degree (degree = 13).
Bias is very low, and the model fits the training data extremely well, capturing even the fine details of the sinusoidal pattern.
Variance is very high as the model is overfitting to noise, and small variations in the training data lead to significant changes in predictions.

Degree = 14:

Similar to degree 13, the model is highly complex.
Bias remains very low, but variance is extremely high, indicating a significant overfitting to noise in the training data.

Degree = 15:

The model is even more complex, and overfitting is pronounced.
Bias remains very low, but variance is extremely high, and the predicted function may start to oscillate wildly between data points.

In summary, as the degree of the polynomial increases, bias tends to decrease, and variance tends to increase. The optimal model complexity is often found where the sum of bias and variance is minimized.

If you want the code for the above plotted graph, drop a comment below.

ML Made Simple