Lec 3: Bias-Variance Tradeoff

Name: Lec 3: Bias-Variance Tradeoff
Uploaded: 2023-07-18T07:27:44.000Z
Duration: 1 h 42 min 37 s
Description: Machine Learning and Deep Learning - Fundamentals and Applications https://onlinecourses.nptel.ac.in/noc23_ee87/preview Prof. M.K. Bhuyan Dept. of Electrical and Electronics Engineering IIT Guwahati

Bias-Variance Tradeoff in Machine Learning

Introduction to Bias and Variance

The lecture introduces the concepts of bias and variance within the context of machine learning, specifically focusing on classification and regression problems.

High bias indicates a significant error in both training and testing phases, suggesting that a very simple model is being used.

In contrast, high variance results in minimal training error but significant testing error due to the use of a complex model.

Understanding Overfitting and Underfitting

A balance between bias and variance is essential; models should neither be too simple (high bias) nor too complex (high variance).

The lecture illustrates how different model complexities affect training and testing errors using graphical representations.

Overfitting occurs with high variance where the model performs well on training data but poorly on unseen data, while underfitting arises from high bias.

Visual Representation of Bias and Variance

Graphical representation shows that as model complexity increases, training error decreases while testing error may increase due to overfitting.

The trade-off between underfitting (high bias) and overfitting (high variance) is emphasized; both scenarios are undesirable for effective modeling.

Decomposing Test Error

The generalization test error can be decomposed into three components: bias error, variance error, and irreducible error.

Bias error stems from incorrect assumptions about the model leading to systematic errors; high variance reflects sensitivity to fluctuations in training data.

Irreducible Error Explained

Irreducible error is attributed to inherent noise within the problem itself, independent of any modeling efforts.

This noise can arise from data quality issues or inaccuracies during data collection processes.

Objective of Balancing Bias and Variance

Understanding Overfitting and Underfitting in Machine Learning

Introduction to Class Problems

The discussion begins with a simple model that struggles with classification, resulting in numerous misclassifications due to an inadequate decision boundary.

Complex Models and Overfitting

A complex model is introduced, which performs poorly on unseen test data due to high error rates, illustrating the concept of overfitting.

Balancing Bias and Variance

The speaker emphasizes the need for a compromise between overfitting (high variance) and underfitting (high bias), highlighting the importance of finding a balanced decision boundary.

High bias leads to significant errors in both training and testing phases, while high variance results in minimal training error but substantial testing error.

Mathematical Formulation of the Problem

The problem is mathematically defined with one independent variable X and one dependent variable Y, where Y depends on X.

Noise in Data Modeling

The relationship is expressed as Y = f(X) + epsilon, indicating that noise (epsilon) affects the dependent variable. This noise has a mean of zero and variance sigma_epsilon^2.

Understanding Variance and Uncertainty

The magnitude of variance represents uncertainty about the underlying phenomenon being modeled.

Finding an Optimal Function

The goal is to find a function hatf that closely approximates the true function f, learned from training data by minimizing a loss function.

Loss Function: Mean Squared Error (MSE)

MSE is introduced as the loss function, aiming to minimize the average squared difference between predicted values (hatf(x)) and observed values (y).

Defining Bias in Predictions

Bias is mathematically defined as the difference between expected predictions across different training datasets and the true underlying function for unseen points.

Defining Variance in Predictions

Understanding Variance and Mean Squared Error

Definition of Variance

Variance is defined as the mean square deviation of hatf(x) from its expected value, which considers different realizations of training data.

The expected value E[hatf(x)] is calculated over various training datasets, emphasizing the variability in predictions.

Decomposing Mean Squared Error (MSE)

The goal is to connect MSE to bias, variance, and irreducible error by decomposing it into these components. This relationship is crucial for understanding model performance.

The expression for MSE can be represented as:

E[y - hatf(x)]^2 = E[textbias(hatf(x))^2] + E[textvariance(hatf(x))] + sigma_epsilon^2 . This highlights how each component contributes to overall error.

Importance of Expectations in MSE

The first expectation in the MSE formula pertains to unseen test points x , while the second expectation relates to training data and random noise epsilon . Understanding this distinction is vital for accurate interpretation.

It’s emphasized that both expectations play a significant role in determining how well a model generalizes beyond its training set.

Proof of Bias-Variance Decomposition

To prove the bias-variance decomposition, we start with the definition of MSE:

E[y - hatf(x)]^2 = E[f(x) + epsilon - hatf(x)]^2 . This sets up our foundational equation for further expansion.

By substituting y = f(x) + epsilon, we expand this equation into simpler terms involving expectations and variances, leading us toward isolating bias and variance components.

Expansion and Final Formulation

Upon expanding the squared term, we derive expressions that separate out bias and variance contributions:

Key relationships emerge showing how independent random variables affect expectations when calculating products or sums involving them.

Ultimately, through careful manipulation of these equations, we arrive at a final formulation where MSE can be expressed distinctly as contributions from irreducible error and expected deviations from predicted values:

Understanding Bias-Variance Decomposition

Expanding Expected Values

The process begins with subtracting and adding the expected value of hatf(x) , leading to an expansion involving E[f(x)] - E[hatf(x)]^2 .

This expansion includes terms like E[hatf(x)] - f(x)^2 + E[hatf(x)] - E[hatf(x)] .

Further expansion results in a term that combines expectations, specifically -2E[f(x)] - E[hatf(x)].

Deriving Key Equations

The derived equation (Equation 5) illustrates the expanded form of the squared difference between expected values.

Equation 6 highlights that one term represents the bias of hatf(x) , while another indicates its variance.

Understanding Bias and Variance

The bias is defined as the square of the difference between the expected value of hatf(x) and f(x) , indicating it remains constant due to both being constants.

Since both f(x) and E[hatf(x)] are constants, applying expectation to squared bias does not alter its value.

Finalizing Equations

From Equation 6, we derive that the squared bias plus variance equals the expected value of f(x)-hatf(x)^2.

This leads to Equation 8, which states that mean square error (MSE) can be expressed as a sum of squared bias and variance.

Proof of Bias Variance Decomposition

Combining previous equations shows that for a set of test points, MSE equals bias plus variance plus irreducible error.

This decomposition clarifies how MSE can be broken down into these three components: bias, variance, and irreducible error.

Implications on Model Complexity

High bias suggests an overly simplistic model (underfitting), while high variance indicates a complex model prone to overfitting.