Supervised Learning: Regression

Supervised Learning: Regression

Introduction to Supervised Learning

Overview of Supervised Learning

  • The lecture introduces supervised learning as a primary paradigm in machine learning, emphasizing its prevalence and importance.
  • Key tasks within supervised learning include regression and classification, which will be explored in detail throughout the lecture.

Notation Setup

  • The set of real numbers is denoted by R, encompassing all scalar values such as 2.3 or -7.6.
  • R^d represents D-dimensional vectors of reals; for example, an element in R^3 could be (3.6, 5.2, -1.8).
  • Vectors are denoted with subscripts (e.g., x_j for the jth coordinate), while norms represent vector lengths using specific symbols.

Vector Representation

  • The norm (length) of a vector is calculated using the Euclidean formula: √(x1² + x2² + x3²).
  • Superscripts denote different vectors while subscripts indicate elements within those vectors; e.g., x_ij refers to the jth coordinate of the ith vector.

Indicator Variables and Their Use

Understanding Indicator Variables

  • Indicator variables are represented as 1 or 0 based on boolean conditions; for instance, "2 is even" yields an indicator value of 1.
  • This notation aids in translating mathematical concepts into practical applications throughout the lecture.

Core Concept: Curve-Fitting

Simplifying Supervised Learning

  • At its essence, supervised learning can be simplified to curve-fitting—finding a function that approximates data points.
  • Data points consist of pairs (x_i, y_i), where each x_i is a d-dimensional vector and y_i represents corresponding labels or outputs.

Model Development

  • The goal is to develop a model f that maps input vectors from R^d to output labels so that f(x_i) closely matches y_i.

Example Application: Regression Problem

Practical Example in Regression

  • An example regression problem involves predicting house prices based on features like room area and distance from metro stations.

Understanding Model Training and Evaluation

Overview of Training Data

  • The training data consists of instances x_i in a d-dimensional space, with corresponding labels y_i representing the price of a house.
  • A learning algorithm outputs a model f , which is a function mapping from R^d to R .
  • The goal is for the model to approximate the relationship such that f(x_i) approx y_i .

Loss Function and Model Evaluation

  • The loss function measures the deviation of predictions from actual values using squared differences: (f(x_i) - y_i)^2 .
  • A loss value of 0 indicates perfect predictions, while lower loss values are preferred by learning algorithms.

Linear Parameterization of Models

  • Models are often represented as linear functions: f(x) = w^T x + b , where parameters include weights ( w_1, w_2, ..., w_d ) and bias ( b ).
  • For example, in predicting house prices, the model might consider factors like number of rooms and area.

Simple Example with One-Dimensional Data

  • An illustration uses one-dimensional input data points (e.g., 1, 2, 3, 6, 7), with corresponding output values (e.g., 2.1, 3.9).
  • This simple dataset allows for easy visualization on a graph where x-axis represents input and y-axis represents output.

Evaluating Different Models

  • Two models are evaluated:
  • Model f(x)=2x_1
  • Model g(x)=x_1+3
  • The learning algorithm computes losses for both models to determine which has smaller error based on given data points.

Conclusion on Model Selection

  • By calculating losses for both models using their respective predicted outputs against actual values, it becomes clear that one model may perform better than another.

Choosing the Best Model in Regression

Understanding Function f and g

  • The discussion begins with selecting the best model from a library rather than just two models, illustrating function f as f = 2x_1, which can be plotted.
  • It is noted that function f fits the data points better than function g (g = x_1 + 3), leading to a lower loss for f compared to g, making it the preferred choice of the learning algorithm.

Simplified Learning Algorithm Illustration

  • A simple illustration of regression is presented, emphasizing that while only two models are considered here, real-world scenarios involve choosing from an extensive set of models.
  • Two example models are introduced: f = 2 times textrooms - 0.5 times textdistance and g = textrooms + 2 times textdistance.

Data Representation and Predictions

  • The context involves three-dimensional space for data representation, where labels are scalar values (e.g., prices).
  • To determine which model performs better, predictions for both functions f and g are computed based on training points to assess their respective losses.

Loss Calculation Methodology

  • Predictions made by both models are compared; loss is calculated by averaging the squared differences between actual values and predicted values for each model.
  • It becomes evident that loss for model f is significantly smaller than that of model g, indicating that f provides a more accurate estimate of price based on its parameters.

Insights from Model Selection

Video description

Supervised Learning, Regression Machine Learning Foundations - Harish Guruprasad Ramaswamy , Arun Rajkumar , Prashanth LA IIT Madras welcomes you to the world’s first BSc Degree program in Programming and Data Science. This program was designed for students and working professionals from various educational backgrounds and different age groups to give them an opportunity to study from IIT Madras without having to write the JEE. Through our online programs, we help our learners to get access to a world-class curriculum in Data Science and Programming. To know more about our Programs, please visit : BSc Degree in Programming and Data Science - https://onlinedegree.iitm.ac.in/ Diploma in Programming / Data Science - https://diploma.iitm.ac.in/