Mathematics of Language Modeling (NLP for High School)
What Language Models Do
In this section, the speaker delves into the functionality of language models, particularly focusing on predicting the next word and the concept of probability distribution within a vocabulary.
Predicting Next Words
- Predicting the next word involves considering multiple options with varying likelihoods.
- Language models establish a probability distribution over a defined vocabulary.
Probability Distribution
- Probability distributions assign probabilities to possible outcomes.
- Probabilities must be greater than or equal to zero and sum up to one for valid distributions.
Vocabulary Size Impact
- Language models operate similarly with larger vocabularies, assigning probabilities to numerous potential outcomes.
- Even rare words have non-zero probabilities in language models.
Model Probabilities and Examples
This part explores how language models assign probabilities to different words based on examples and discusses the implications of these probabilities in generating text.
Model Probability Representation
- Language models like ChatGPT assign probabilities to various words based on their likelihood in context.
- The model provides low but non-zero probabilities even for unlikely words or phrases.
Example Analysis
- Previous systems like GPT3 allowed users to view specific word probabilities generated by the model.
- Different prompts can elicit varied responses from language models, showcasing diverse probability assignments for subsequent words.