Data, Models and ML Task

Data, Models and ML Task

What is Data in Machine Learning?

Understanding Data

  • The course introduces fundamental terms in machine learning, focusing on the definitions of data, models, and ML tasks.
  • In machine learning, data typically refers to a collection of vectors rather than just bits or bytes.
  • An example illustrates that each house can be represented by a vector (e.g., dimensions like number of rooms and price), forming a dataset.
  • Metadata provides context for the data, explaining what each number in the vector represents (e.g., number of rooms, area).
  • While metadata makes data interpretable for humans, computers only require consistency in how data is represented.

What is a Model?

Defining Models

  • A model serves as a mathematical simplification of reality and has been utilized across various scientific fields for centuries.
  • The Ideal Gas model exemplifies how models represent complex realities through simplified equations (e.g., PV = nRT).
  • Other examples include Newton's gravitational law which approximates gravitational attraction but does not account for all variables.
  • Models are essential in economics to predict outcomes based on changes in supply and demand; they help quantify relationships even if exact answers are unknown.

Understanding Models in Machine Learning

The Nature of Models

  • Models are mathematical simplifications of reality, not exact representations. They allow for analysis and planning but cannot encapsulate the full complexity of real-world scenarios.
  • George Box's famous quote highlights that "all models are wrong, but some are useful," emphasizing that while models may be approximations, they can still provide valuable insights into complex systems.

Types of Models in Machine Learning

  • In machine learning, models refer to predictive and probabilistic types. Predictive models forecast outcomes based on input data, while probabilistic models focus on understanding uncertainty rather than making predictions.

Predictive Models

  • Two primary types of predictive models are regression and classification. Regression predicts continuous values (e.g., prices), while classification predicts discrete categories (e.g., yes/no outcomes).

Regression Model Insights

  • A regression model can predict house prices based on factors like area and distance from a metro station. For example, a model might suggest that price increases with area and decreases with distance from the city.
  • While there will always be exceptions to these trends, such general rules help guide decisions about property searches within budget constraints.

Application of Regression Models

  • Using a regression model allows one to estimate unknown values based on known variables. For instance, predicting the price of a new house using its size and location relative to public transport.

Classification Model Insights

  • Classification models differ by predicting categorical outcomes instead of continuous values. An example could involve determining if a house is close or far from a metro station based on its price and area.

Example of Classification Model Usage

  • A simple classification rule might state that if the number of rooms multiplied by two minus the price is less than one, then it’s considered close to the metro; otherwise, it’s far away.

Understanding Probabilistic Models

Understanding Probability Models in Machine Learning

Evaluating Events and Configurations

  • Probability models assess the likelihood of specific events or configurations, such as determining the probability of a randomly chosen person being located at given latitude and longitude coordinates.
  • The probability varies significantly based on location; for instance, being in the Sahara Desert yields a low probability compared to being in a populated area like Bombay.

Scoring Reality with Probability Models

  • Different geographical coordinates have varying probabilities of containing randomly selected individuals, effectively "scoring" reality based on these configurations.
  • A practical application includes evaluating whether a tweet was generated by a specific individual (e.g., Mr. Chopra), where similar tweets receive higher scores than random strings of characters.

Learning Algorithms: Converting Data into Models

  • Learning algorithms play a crucial role in machine learning by transforming data into predictive models.
  • These algorithms select from various model structures, typically choosing the best fit based on parameters that define relationships within the data.

Model Parameters and Predictions

  • For example, when predicting house prices, one might establish an initial model structure involving parameters like area size and distance to metro stations before analyzing any data.
  • The learning algorithm determines optimal values for these parameters (A, B, C), which characterize the model's predictions.

Human Guidance in Model Development

  • In machine learning tasks, humans provide broad outlines rather than explicit instructions for model creation; they guide the process without detailing every aspect.
  • The learning algorithm utilizes historical data to construct accurate models based on human-defined guidelines and existing datasets.
Video description

"What is data? What is a model?" Machine Learning Foundations - Harish Guruprasad Ramaswamy , Arun Rajkumar , Prashanth LA IIT Madras welcomes you to the world’s first BSc Degree program in Programming and Data Science. This program was designed for students and working professionals from various educational backgrounds and different age groups to give them an opportunity to study from IIT Madras without having to write the JEE. Through our online programs, we help our learners to get access to a world-class curriculum in Data Science and Programming. To know more about our Programs, please visit : BSc Degree in Programming and Data Science - https://onlinedegree.iitm.ac.in/ Diploma in Programming / Data Science - https://diploma.iitm.ac.in/