Supervised Learning: Regression

Name: Supervised Learning: Regression
Uploaded: 2025-01-30T11:06:56.000Z
Duration: 48 min 21 s

Introduction to Supervised Learning

Overview of Supervised Learning

The lecture introduces supervised learning as a primary paradigm in machine learning, emphasizing its prevalence and importance.

Key tasks within supervised learning include regression and classification, which will be explored in detail throughout the lecture.

Notation Setup

The set of real numbers is denoted by R, encompassing all scalar values such as 2.3 or -7.6.

R^d represents D-dimensional vectors of reals; for example, an element in R^3 could be (3.6, 5.2, -1.8).

Vectors are denoted with subscripts (e.g., x_j for the jth coordinate), while norms represent vector lengths using specific symbols.

Vector Representation

The norm (length) of a vector is calculated using the Euclidean formula: √(x1² + x2² + x3²).

Superscripts denote different vectors while subscripts indicate elements within those vectors; e.g., x_ij refers to the jth coordinate of the ith vector.

Indicator Variables and Their Use

Understanding Indicator Variables

Indicator variables are represented as 1 or 0 based on boolean conditions; for instance, "2 is even" yields an indicator value of 1.

This notation aids in translating mathematical concepts into practical applications throughout the lecture.

Core Concept: Curve-Fitting

Simplifying Supervised Learning

At its essence, supervised learning can be simplified to curve-fitting—finding a function that approximates data points.

Data points consist of pairs (x_i, y_i), where each x_i is a d-dimensional vector and y_i represents corresponding labels or outputs.

Model Development

The goal is to develop a model f that maps input vectors from R^d to output labels so that f(x_i) closely matches y_i.

Example Application: Regression Problem

Practical Example in Regression

An example regression problem involves predicting house prices based on features like room area and distance from metro stations.

Understanding Model Training and Evaluation

Overview of Training Data

The training data consists of instances x_i in a d-dimensional space, with corresponding labels y_i representing the price of a house.

A learning algorithm outputs a model f , which is a function mapping from R^d to R .

The goal is for the model to approximate the relationship such that f(x_i) approx y_i .

Loss Function and Model Evaluation

The loss function measures the deviation of predictions from actual values using squared differences: (f(x_i) - y_i)^2 .

A loss value of 0 indicates perfect predictions, while lower loss values are preferred by learning algorithms.

Linear Parameterization of Models

Models are often represented as linear functions: f(x) = w^T x + b , where parameters include weights ( w_1, w_2, ..., w_d ) and bias ( b ).

For example, in predicting house prices, the model might consider factors like number of rooms and area.

Simple Example with One-Dimensional Data

An illustration uses one-dimensional input data points (e.g., 1, 2, 3, 6, 7), with corresponding output values (e.g., 2.1, 3.9).

This simple dataset allows for easy visualization on a graph where x-axis represents input and y-axis represents output.

Evaluating Different Models

Two models are evaluated:

Model f(x)=2x_1

Model g(x)=x_1+3

The learning algorithm computes losses for both models to determine which has smaller error based on given data points.

Conclusion on Model Selection

By calculating losses for both models using their respective predicted outputs against actual values, it becomes clear that one model may perform better than another.

Choosing the Best Model in Regression

Understanding Function f and g

The discussion begins with selecting the best model from a library rather than just two models, illustrating function f as f = 2x_1, which can be plotted.

It is noted that function f fits the data points better than function g (g = x_1 + 3), leading to a lower loss for f compared to g, making it the preferred choice of the learning algorithm.

Simplified Learning Algorithm Illustration

A simple illustration of regression is presented, emphasizing that while only two models are considered here, real-world scenarios involve choosing from an extensive set of models.

Two example models are introduced: f = 2 times textrooms - 0.5 times textdistance and g = textrooms + 2 times textdistance.

Data Representation and Predictions

The context involves three-dimensional space for data representation, where labels are scalar values (e.g., prices).

To determine which model performs better, predictions for both functions f and g are computed based on training points to assess their respective losses.

Loss Calculation Methodology

Predictions made by both models are compared; loss is calculated by averaging the squared differences between actual values and predicted values for each model.

It becomes evident that loss for model f is significantly smaller than that of model g, indicating that f provides a more accurate estimate of price based on its parameters.

Insights from Model Selection