Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction

Introduction to Unsupervised Learning

Overview of Unsupervised Learning

  • The lecture introduces the unsupervised learning paradigm, contrasting it with supervised learning, which focuses on regression and classification tasks.
  • Unsupervised learning is described as vague and typically serves as a pre-processing step rather than an end goal.
  • The primary aim of unsupervised learning is to build models that compress, explain, and group data.

Practical Example of Unsupervised Learning

  • An example illustrates how unsupervised learning can be applied in marketing by summarizing large volumes of tweets about Coca Cola.
  • The challenge involves grouping millions of tweets into manageable categories for effective reporting to management.
  • Grouping tweets allows for easier interpretation and actionable insights, highlighting the importance of human interpretation post-analysis.

Dimensionality Reduction in Unsupervised Learning

Purpose and Application

  • Dimensionality reduction aims at compression and simplification, particularly useful in handling large datasets like gene expression levels from numerous individuals.
  • A practical scenario involves managing a massive matrix representing gene expressions across many subjects, emphasizing the need for data transmission efficiency.

Mechanism of Dimensionality Reduction

  • The process involves creating two models: an encoder that compresses high-dimensional data into lower dimensions (d') and a decoder that reconstructs the original data from this compressed format.

Understanding Encoder-Decoder Mechanisms

The Goal of Encoding and Decoding

  • The primary objective is to ensure that the function g(f(x_i)) approximates x_i . A perfect match would yield zero, but an approximation is acceptable.
  • To measure this approximation, one can compute the squared norm of the difference between g(f(x_i)) and x_i , aiming for a minimal value across all inputs.

Dimensionality Reduction Example

  • An example illustrates dimensionality reduction where input data has 2 dimensions (d = 2) and is reduced to 1 dimension (d' = 1).
  • Four training points are provided: (1, 0.8), (2, 2.2), (3, 3.2), and (4, 3.8).

Encoder and Decoder Functions

  • The encoder function f(x) maps a two-dimensional vector to a scalar by calculating x_1 - x_2 .
  • The decoder function g(u) takes this scalar output and returns a two-dimensional vector in the form of (u, u) .

Evaluation of Initial Encoder/Decoder Pair

  • Testing the first encoder-decoder pair shows poor performance; distinct inputs map to identical outputs like (0.2, 0.2), indicating ineffective compression.
  • This initial setup fails to retain original input information despite reducing dimensionality.

Improved Encoder/Decoder Functions

  • A new pair of functions is introduced: f' (tilde)(x)=x_1+x_2/2 text and g'(u)= (u,u).
  • Applying these functions yields better results with outputs such as (0.9, 0.9), which are closer to their respective original points.

Comparison of Performance

  • The improved encoder-decoder pair demonstrates significantly better accuracy in reconstructing original data compared to the first pair.
  • Visual representation confirms that outputs from the second set closely align with original data points while maintaining proximity among them.

Conclusion on Effectiveness

Dimensionality Reduction Algorithm Overview

Simplified Dimensionality Reduction Process

  • The discussion introduces a simplified dimensionality reduction algorithm that operates by selecting between two pairs of encoder-decoder functions, denoted as f, g or tildef, tildeg .
  • In practice, a more complex algorithm would evaluate an infinite number of potential functions for both encoding and decoding processes to determine the optimal pair for dimensionality reduction.
Video description

Unsupervised Learning, Dimensionality Reduction Machine Learning Foundations - Harish Guruprasad Ramaswamy , Arun Rajkumar , Prashanth LA IIT Madras welcomes you to the world’s first BSc Degree program in Programming and Data Science. This program was designed for students and working professionals from various educational backgrounds and different age groups to give them an opportunity to study from IIT Madras without having to write the JEE. Through our online programs, we help our learners to get access to a world-class curriculum in Data Science and Programming. To know more about our Programs, please visit : BSc Degree in Programming and Data Science - https://onlinedegree.iitm.ac.in/ Diploma in Programming / Data Science - https://diploma.iitm.ac.in/