Unsupervised Learning: Density Estimation
Density Estimation in Unsupervised Learning
Introduction to Density Estimation
- Density estimation is an unsupervised learning problem that outputs a probabilistic model, which scores different configurations of reality.
- An example goal is to create a model that generates tweets similar to those from a specific account, treating them as independently generated.
Practical Example: Generating Tweets
- A site called wisdomofchopra.com illustrates this concept by generating tweets resembling those of Mr. Chopra, suggesting they are indistinguishable from profound-sounding phrases.
- The main tool for creating such robotic accounts is density estimation, which provides a scoring function for potential tweets.
Understanding the Probabilistic Model
- A tweet can be represented as a 128-length character array; the density estimation algorithm assigns probability scores to all possible tweets.
- The scoring function evaluates how likely it is for any given tweet (e.g., "apple is bad") to have been generated by Mr. Chopra.
Mathematical Framework of Density Estimation
- The data consists of n tweets represented as d-dimensional vectors; the goal is to create a probability mapping p , which maps from R^d to positive real numbers.
- This mapping must sum to 1 across all possible outputs, ensuring that not all tweets can receive high scores simultaneously.
Goals and Loss Function in Density Estimation
- The objective of density estimation algorithms is to develop models where high probabilities correspond with actual data points while maintaining low probabilities for non-data points.
- The loss function used is negative log likelihood, aiming for high probabilities on existing data points and minimizing their negative log values.
Example Illustration of One-Dimensional Data
- An illustration involves one-dimensional data with four sample points: 2.3, 2.7, 4.6, and 4.9 plotted on a number line from 0 to 10.
Density Estimation Models and Their Evaluation
Introduction to Density Estimation Models
- The discussion begins with the introduction of four data points, emphasizing the use of density estimation algorithms to select the best model from a set of proposed models.
Proposed Probability Models
- The first model is defined as p(x) = 1/10 for x in [0, 10] , indicating a uniform distribution over this interval.
- A second model is introduced: p(x) = 1/5 for x in [0, 5] , which is also a valid probability model.
- The third model is specified as p_3(x) = 1/5 for x in [3, 8] . All three models are valid in terms of probability distributions.
Evaluating Model Performance
- The data points under consideration are 2.3, 2.6, 4.6, and 4.9; scores for each point will be calculated based on the proposed models.
- For model p_1 , all four points yield a score of 1/10 ; for model p_2, they yield a score of 1/5 .
- Model p_3's evaluation reveals that scores for points outside its range (like 2.3 and 2.6) result in zero probabilities leading to an infinite negative log likelihood.
Comparing Negative Log Likelihood
- The loss calculation shows that the negative log likelihood for model p_1 results in finite values while that for model p_3 becomes infinity due to zero probabilities.
- Between models p_1 and p_2, it’s concluded that since the loss of p_2 is lower than that of p_1, it serves as a better probabilistic explanation for the given data.
Implications and Further Considerations
- While discussing these toy examples, it's noted that real-world density estimation involves selecting from an infinite set of potential models rather than just three options.
Gaussian Mixture Models vs Other Approaches
Introduction to More Complex Models
- A new example introduces nine data points represented graphically on a Cartesian plane with two competing models: one being a Gaussian mixture centered at specific coordinates.
Model Comparison
- Another proposed model suggests different centers (e.g., five comma five; eight comma nine; minus one comma minus two). Observationally, it appears there are three distinct clusters within the data.
Conclusion on Model Selection
- It’s suggested that if negative log likelihood were computed between these two models (Gaussian mixture vs alternative centers), the Gaussian mixture would likely provide a better fit based on visual assessment.
Final Thoughts on Density Estimation
Introduction to Unsupervised Learning Problems
Overview of Key Concepts
- The introductory week of the machine learning foundations course covers unsupervised learning problems, specifically dimensional reduction and density estimation.
- Future lessons will explore how to construct learning algorithms that can identify the best model from an infinite class of models.