Supervised Learning: Classification
Introduction to Classification Learning
Overview of the Lecture
- The lecture focuses on the classification learning problem, building upon previous discussions about supervised learning and regression.
- A simple example is introduced: predicting whether a house has more than three rooms based on its area and price.
Key Concepts in Classification
- In classification problems, labels are binary (e.g., +1 or -1), contrasting with regression where labels are real-valued.
- The model function f maps input features from R^d to binary outputs (+1 or -1).
Evaluating Classification Models
Loss Function Definition
- The loss of a model is defined as the fraction of misclassified instances, calculated using an indicator function that checks if predictions match actual labels.
- For correct predictions ( f(x_i) = y_i ), there is no loss; otherwise, it contributes to the total loss.
Model Parameterization
- Unlike regression models which can output continuous values, classification models typically use a sign function to ensure outputs are either +1 or -1.
- A common approach for linear classifiers involves using a linear separator represented by f(x) = textsign(w^T x + b) .
Illustration of Classification with Data Points
Example Dataset
- An example dataset consists of six two-dimensional data points labeled as positive (+1 for first three points) and negative (-1 for last three points).
Visualization of Data Points
- Positive data points are plotted in red while negative ones are shown in blue on a 2D graph. This visual representation aids understanding of how classification works.
Model Evaluation Process
Comparing Two Models
- The lecture introduces two models f and g , each defined by different mathematical expressions involving the input features.
Loss Calculation for Models
Understanding Classifiers and Loss Functions
Defining the Sign Convention
- The sign convention is established where:
- Sign of 0 is defined as +1.
- Sign of 1 is defined as -2, resulting in a value of -1.
- Other examples include:
- Sign of (4 - 8) = -1
- Sign of (3 - 8) = -1
- Sign of (4 - 6) = -1.
Calculating Loss for Models f and g
- The loss for model f is calculated as:
- textLoss(f) = 1/6 times textnumber of wrong predictions .
- Model f correctly predicts all data points, leading to a loss of 0.
- For model g, it incorrectly predicts one out of six data points:
- Thus, textLoss(g) = 1/6 .
Preference for Model f
- Given that model f has a loss of 0 while model g has a loss of 1/6, the learning algorithm would prefer model f over g.
- In more complex scenarios, the algorithm would derive an optimal function from training data rather than just choosing between two models.
Visualizing Classifiers
- A classifier can be visualized by separating input regions into positive (+1) and negative (-1):
- For model f, inputs classified as positive occur when textsign(2-x_1) > 0 .
- This visualization helps understand why model f achieves zero loss on training data; it accurately classifies all points based on their respective regions.
Analyzing Model g's Performance
- When analyzing model g:
- It misclassifies one point which leads to its non-zero loss.
- The discussion emphasizes that there are multiple classifiers available beyond just models f and g; however, in this case, model f proves superior due to its performance metrics.
Exploring More Complex Classification Examples
Introducing New Models Based on Room Count
- A new scenario involves three models focusing on room classification based on area:
- Model f: f(x) = textsign(area −10)
- Predicts if rooms are greater than or equal to three based on area.
Encoding Room Data
- Rooms with counts ≤3 are labeled as −1, while those >3 are labeled as +1.
- This encoding simplifies predictions regarding whether the number of rooms exceeds three or not.
Predictions from Different Models
- Predictions made by each model for six houses with varying areas:
- Model f outputs:
[-1, −1, +1, +1, +1, +1]indicating four houses have areas greater than ten.
- Model g uses price criteria:
[-1, −1, +1, +1,+ +,+], also reflecting similar outcomes based on price thresholds.
Understanding Model Loss in Classification
Evaluating Model Performance
- The loss of model f is calculated to be 0, indicating it perfectly captures the true pattern of the data.
- Similarly, model g also has a loss of 0. In contrast, model h has a loss of 3 out of 6, meaning it misclassifies half of the data points.
- Both models f and g are equally effective based on training data since they achieve zero loss; however, model h is deemed ineffective due to its non-zero loss.
- The discussion emphasizes that real classification algorithms typically choose from an infinite variety of models rather than a finite set like f, g, or h.
Importance of Test Data
- To evaluate a chosen model (e.g., f), it's crucial not to use the training data for assessment but instead utilize separate test data.
- A counterexample illustrates that even if model f achieves zero loss on training data by predicting correctly for known examples, it may fail with unseen data.
Generalization Over Memorization
- The primary goal in machine learning is to develop models that generalize well to new instances rather than just performing well on existing training datasets.
- Evaluating performance solely on training data can lead to overoptimistic assessments; thus, using held-out test data is essential for accurate evaluation.
Real-world Application Example
- When predicting house prices using historical data (1990–2020), the effectiveness of the model should be judged based on its predictions for future houses (e.g., those listed in 2021).
Distinction Between Data Sets
Understanding Model Selection in Learning Algorithms
The Role of Parameterization
- The learning algorithm aims to find optimal parameters (a, b, c) for a model predicting price based on various factors.
- Alternative parameterizations are possible; for instance, one could use combinations like area multiplied by rooms or distance squared.
- There is no inherent reason to prefer one parameterization over another; both can be valid approaches.
Human Intuition in Model Selection
- Parameterization choices fall under the broader category of model selection, which is not dictated by the learning algorithm itself.
- Humans typically make these decisions based on common sense and intuition regarding how price relates to features like area, distance, and room count.
- Multiple intuitive choices exist for parameterization; thus, selecting the best option requires careful consideration.
Validation Data in Model Evaluation
- To determine the most effective collection of models, validation data is utilized as a held-out dataset for evaluation.