Lec 3: Bias-Variance Tradeoff

Lec 3: Bias-Variance Tradeoff

Bias-Variance Tradeoff in Machine Learning

Introduction to Bias and Variance

  • The lecture introduces the concepts of bias and variance within the context of machine learning, specifically focusing on classification and regression problems.
  • High bias indicates a significant error in both training and testing phases, suggesting that a very simple model is being used.
  • In contrast, high variance results in minimal training error but significant testing error due to the use of a complex model.

Understanding Overfitting and Underfitting

  • A balance between bias and variance is essential; models should neither be too simple (high bias) nor too complex (high variance).
  • The lecture illustrates how different model complexities affect training and testing errors using graphical representations.
  • Overfitting occurs with high variance where the model performs well on training data but poorly on unseen data, while underfitting arises from high bias.

Visual Representation of Bias and Variance

  • Graphical representation shows that as model complexity increases, training error decreases while testing error may increase due to overfitting.
  • The trade-off between underfitting (high bias) and overfitting (high variance) is emphasized; both scenarios are undesirable for effective modeling.

Decomposing Test Error

  • The generalization test error can be decomposed into three components: bias error, variance error, and irreducible error.
  • Bias error stems from incorrect assumptions about the model leading to systematic errors; high variance reflects sensitivity to fluctuations in training data.

Irreducible Error Explained

  • Irreducible error is attributed to inherent noise within the problem itself, independent of any modeling efforts.
  • This noise can arise from data quality issues or inaccuracies during data collection processes.

Objective of Balancing Bias and Variance

Understanding Overfitting and Underfitting in Machine Learning

Introduction to Class Problems

  • The discussion begins with a simple model that struggles with classification, resulting in numerous misclassifications due to an inadequate decision boundary.

Complex Models and Overfitting

  • A complex model is introduced, which performs poorly on unseen test data due to high error rates, illustrating the concept of overfitting.

Balancing Bias and Variance

  • The speaker emphasizes the need for a compromise between overfitting (high variance) and underfitting (high bias), highlighting the importance of finding a balanced decision boundary.
  • High bias leads to significant errors in both training and testing phases, while high variance results in minimal training error but substantial testing error.

Mathematical Formulation of the Problem

  • The problem is mathematically defined with one independent variable X and one dependent variable Y, where Y depends on X.

Noise in Data Modeling

  • The relationship is expressed as Y = f(X) + epsilon, indicating that noise (epsilon) affects the dependent variable. This noise has a mean of zero and variance sigma_epsilon^2.

Understanding Variance and Uncertainty

  • The magnitude of variance represents uncertainty about the underlying phenomenon being modeled.

Finding an Optimal Function

  • The goal is to find a function hatf that closely approximates the true function f, learned from training data by minimizing a loss function.

Loss Function: Mean Squared Error (MSE)

  • MSE is introduced as the loss function, aiming to minimize the average squared difference between predicted values (hatf(x)) and observed values (y).

Defining Bias in Predictions

  • Bias is mathematically defined as the difference between expected predictions across different training datasets and the true underlying function for unseen points.

Defining Variance in Predictions

Understanding Variance and Mean Squared Error

Definition of Variance

  • Variance is defined as the mean square deviation of hatf(x) from its expected value, which considers different realizations of training data.
  • The expected value E[hatf(x)] is calculated over various training datasets, emphasizing the variability in predictions.

Decomposing Mean Squared Error (MSE)

  • The goal is to connect MSE to bias, variance, and irreducible error by decomposing it into these components. This relationship is crucial for understanding model performance.
  • The expression for MSE can be represented as:
  • E[y - hatf(x)]^2 = E[textbias(hatf(x))^2] + E[textvariance(hatf(x))] + sigma_epsilon^2 . This highlights how each component contributes to overall error.

Importance of Expectations in MSE

  • The first expectation in the MSE formula pertains to unseen test points x , while the second expectation relates to training data and random noise epsilon . Understanding this distinction is vital for accurate interpretation.
  • It’s emphasized that both expectations play a significant role in determining how well a model generalizes beyond its training set.

Proof of Bias-Variance Decomposition

  • To prove the bias-variance decomposition, we start with the definition of MSE:
  • E[y - hatf(x)]^2 = E[f(x) + epsilon - hatf(x)]^2 . This sets up our foundational equation for further expansion.
  • By substituting y = f(x) + epsilon, we expand this equation into simpler terms involving expectations and variances, leading us toward isolating bias and variance components.

Expansion and Final Formulation

  • Upon expanding the squared term, we derive expressions that separate out bias and variance contributions:
  • Key relationships emerge showing how independent random variables affect expectations when calculating products or sums involving them.
  • Ultimately, through careful manipulation of these equations, we arrive at a final formulation where MSE can be expressed distinctly as contributions from irreducible error and expected deviations from predicted values:

Understanding Bias-Variance Decomposition

Expanding Expected Values

  • The process begins with subtracting and adding the expected value of hatf(x) , leading to an expansion involving E[f(x)] - E[hatf(x)]^2 .
  • This expansion includes terms like E[hatf(x)] - f(x)^2 + E[hatf(x)] - E[hatf(x)] .
  • Further expansion results in a term that combines expectations, specifically -2E[f(x)] - E[hatf(x)].

Deriving Key Equations

  • The derived equation (Equation 5) illustrates the expanded form of the squared difference between expected values.
  • Equation 6 highlights that one term represents the bias of hatf(x) , while another indicates its variance.

Understanding Bias and Variance

  • The bias is defined as the square of the difference between the expected value of hatf(x) and f(x) , indicating it remains constant due to both being constants.
  • Since both f(x) and E[hatf(x)] are constants, applying expectation to squared bias does not alter its value.

Finalizing Equations

  • From Equation 6, we derive that the squared bias plus variance equals the expected value of f(x)-hatf(x)^2.
  • This leads to Equation 8, which states that mean square error (MSE) can be expressed as a sum of squared bias and variance.

Proof of Bias Variance Decomposition

  • Combining previous equations shows that for a set of test points, MSE equals bias plus variance plus irreducible error.
  • This decomposition clarifies how MSE can be broken down into these three components: bias, variance, and irreducible error.

Implications on Model Complexity

  • High bias suggests an overly simplistic model (underfitting), while high variance indicates a complex model prone to overfitting.
Video description

Machine Learning and Deep Learning - Fundamentals and Applications https://onlinecourses.nptel.ac.in/noc23_ee87/preview Prof. M.K. Bhuyan Dept. of Electrical and Electronics Engineering IIT Guwahati