Lecture 1 "Supervised Learning Setup" -Cornell CS4780 Machine Learning for Decision Making SP17

Name: Lecture 1 "Supervised Learning Setup" -Cornell CS4780 Machine Learning for Decision Making SP17
Uploaded: 2018-07-09T20:53:19.000Z
Duration: 1 h 34 min 49 s

Introduction

The speaker introduces himself and reminds the audience of the "no laptop" rule. He also mentions a technical issue with his MacBook Pro and projector compatibility.

The speaker welcomes the audience and introduces himself.

The "no laptop" rule is mentioned.

A technical issue with the speaker's MacBook Pro and projector compatibility is discussed.

Julia vs Python

The speaker discusses Julia as an alternative to Python for machine learning, mentioning its similarity in syntax to MATLAB.

The speaker mentions that many people like Python but forgot to mention that Julia has similar syntax to MATLAB.

The audience is asked who is in favor of using Julia for machine learning.

Despite some excitement about Julia, it remains tied with Python.

What is Machine Learning?

The speaker provides an overview of traditional computer science versus machine learning, explaining how machine learning algorithms generate programs based on input data and desired output.

Traditional computer science involves writing a program that generates output from input data.

In contrast, machine learning algorithms generate programs based on input data and desired output.

Machine learning can be used when there is no existing method for generating desired output from input data (e.g., diagnosing Alzheimer's disease from fMRI scans).

Terminator Motivation

The speaker shares his motivation for getting into machine learning: wanting his own terminator to beat up his brother.

Traditional Computer Science

The speaker explains how traditional computer science involves writing a program that generates output from input data, such as playing an mp3 file.

Traditional computer science involves writing a program that generates output from input data (e.g., playing an mp3 file).

This process typically involves looking up specifications and writing code to decode the input data.

Machine Learning Example

The speaker provides an example of a machine learning problem involving diagnosing Alzheimer's disease from fMRI scans.

Machine learning can be used when there is no existing method for generating desired output from input data (e.g., diagnosing Alzheimer's disease from fMRI scans).

Data can be collected for which desired output is known, allowing a machine learning algorithm to generate a program that generates the desired output from input data.

Introduction to Machine Learning

This section provides an introduction to machine learning, including the difference between training and testing, and a brief history of the field.

What is Machine Learning?

Machine learning involves algorithms that improve on tasks with experience.

Programs can be generated automatically or written without prior knowledge of how to write them.

Tom Mitchell's 1997 definition: "A computer program is set to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E."

History of Machine Learning

Samuel's checker player in 1952 was the first algorithm that learned from experience.

Frank Rosenblatt invented the perceptron in 1957, which is still used today and was revolutionary at the time. It ultimately led to artificial neural networks and deep learning.

Conclusion

This section concludes the lecture on machine learning by summarizing key points discussed throughout.

Key Takeaways

Machine learning involves algorithms that improve on tasks with experience.

Programs can be generated automatically or written without prior knowledge of how to write them.

Tom Mitchell's 1997 definition states that a computer program learns from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.

The first algorithm that learned from experience was Samuel's checker player in 1952.

Frank Rosenblatt invented the perceptron in 1957, which ultimately led to artificial neural networks and deep learning.

The AI Winter

This section discusses the AI winter, a period of time when funding for AI research drastically decreased due to the limitations of single layer perceptrons.

The Limitations of Single Layer Perceptrons

In 1957, computers were not powerful enough to train large networks.

The excitement around AI was due to its ability to learn from experience and play games like checkers.

Minsky and Papert's book analyzed the limitations of single layer perceptrons using a simple data set called the X or data set.

They showed that single layer perceptrons could not distinguish between four points with two pluses and two circles.

The Resulting AI Winter

Funding for AI research decreased drastically in most countries around the world.

However, some researchers argued that they were doing machine learning instead of AI, which led to new approaches being taken seriously.

Machine Learning vs. AI

This section discusses the differences between machine learning and traditional AI.

Machine Learning as a Bottom-Up Approach

Machine learning is more focused on finding programs that can learn from data rather than trying to build something like a human.

It is a bottom-up approach that starts with a computer rather than trying to replicate human thinking.

Statistics and Optimization vs. Logic

Machine learning focuses on statistics and optimization rather than logic, which is traditionally used in computer science.

There is more freedom in what can be done with machine learning because it does not have to follow strict logical rules.

The Importance of Uncertainty and Statistics

In this section, the speaker discusses how humans are not completely logical and how uncertainty is an important factor that needs to be taken into account. He explains that statistics is the right approach to deal with uncertainty.

Humans are Not Completely Logical

Some people thought that humans are completely logical.

It's not in the brain; everything is very obvious.

Ultimately, this approach was wrong.

Importance of Uncertainty

Sometimes we have beliefs about something, but it's not clear what the truth is.

People made statements like "Is the light on? Yes or no?" and acted accordingly.

However, sometimes you're not sure about something, and statistics is the right approach to deal with uncertainty.

Statistics as a Solution

Statisticians have dealt with uncertainty for four generations.

Machine learning was a much more cystic or optimization-based approach than old statistics.

The Power of Artificial Neural Networks in Game Playing

In this section, the speaker talks about how artificial neural networks were used in game playing. He explains how Jerry Tesauro wrote a program that played backgammon using an artificial neural network instead of using minimax algorithm.

Minimax Algorithm vs Artificial Neural Network

The traditional way of game playing was to use minimax algorithm.

Jerry Tesauro used an artificial neural network instead of minimax algorithm for his backgammon program.

Reinforcement Learning

Jerry Tesauro let his program play against itself and reinforced it based on positive feedback when it won and negative feedback when it lost.

This feedback loop allowed the program to learn and improve its game playing ability.

Success of the Program

Jerry Tesauro's program played against itself overnight and beat him in the morning.

He challenged other people to play against his program, and it beat every single one of them.

He called the organization with a world championship in backgammon and challenged the world champion. His program won.

The History of AI in Gaming

This section discusses the history of AI in gaming, starting with backgammon and moving on to chess.

Backgammon

IBM created a program that played backgammon against a world champion.

The program made an interesting move that the world champion thought was crazy but turned out to be good.

The opening discovered by the program is now used in backgammon.

Chess

IBM challenged Garry Kasparov, a world champion at chess, to play against their program.

In two matches, one won by Kasparov and one won by the program, traditional minimax AI approach was used instead of learning approaches.

Machine Learning in Search Engines

This section discusses how machine learning is used in search engines like Google and Yahoo.

How Search Engines Work

Humans have predictable expectations for what they want to see when searching for something.

Hundreds of people label web pages as good or bad answers which are then fed into a big learning algorithm.

Spam filters are subjective and difficult to write because spam emails are defined subjectively and spammers can easily change their tactics.

Introduction to Machine Learning

In this section, the speaker introduces machine learning and its applications in various fields.

What is Machine Learning?

Machine learning is a type of artificial intelligence that allows computers to learn from data without being explicitly programmed.

It involves training algorithms on large datasets to identify patterns and make predictions or decisions based on new data.

Applications of Machine Learning

Spam filters and Google News are examples of machine learning applications that learn from user behavior to improve their performance.

Self-driving cars are made possible by machine learning algorithms that can learn how to drive safely.

Machine learning is being applied in various fields such as biology, chemistry, and data science.

The Human Brain vs. Machine Learning

In this section, the speaker discusses the differences between human brains and machine learning algorithms.

The Human Brain

The human brain is a big computer with different hardware than traditional computers.

Humans are good at learning, which makes them capable of performing many tasks.

Machine Learning Algorithms

Machine learning algorithms are not close to matching the capabilities of the human brain.

Deep learning algorithms may be better than humans at specific tasks but cannot match humans' versatility.

Types of Machine Learning

In this section, the speaker explains the three types of machine learning: supervised, unsupervised, and reinforcement.

Supervised Learning

Supervised learning involves training an algorithm on labeled data to predict outputs for new inputs.

Examples include spam filters and search engines.

Unsupervised Learning

Unsupervised learning involves training an algorithm on unlabeled data to identify patterns or similarities.

Examples include data science and finding patterns in MRI images.

Reinforcement Learning

Reinforcement learning involves training an algorithm to make decisions based on rewards or punishments.

Examples include game-playing algorithms and robotics.

Introduction to Supervised Learning

In this section, the speaker introduces the concept of supervised learning and explains its high-level goal.

What is Supervised Learning?

Supervised learning involves making predictions from data.

The data set consists of n pairs of points where X is the data and Y is the output that needs to be generated.

X is called the feature vector, while Y is called the label.

These data points are sampled from some distribution P that we have no access to.

Terminology

Curly A sub i represents a space where X comes from.

Curly Y represents a space where Y comes from.

Examples

The speaker gives examples of spam filtering and stock market prediction as use cases for supervised learning.

Formalizing Supervised Learning

In this section, the speaker formalizes supervised learning by defining key terms and concepts.

Key Terms

iid stands for independent and identically distributed.

The X's are drawn iid (independent and identically distributed).

The curly A sub i represents a space where X comes from.

Curly Y represents a space where Y comes from.

Data Points

These data points are sample from some distribution P.

If X_i,Y_i ~ P then knowing one of these won't tell you anything about the other ones if you know their decisions.

Conclusion

In this section, the speaker concludes by summarizing what was covered in this lecture on supervised learning.

Recap

Supervised learning involves making predictions from data.

The data set consists of n pairs of points where X is the data and Y is the output that needs to be generated.

X is called the feature vector, while Y is called the label.

These data points are sampled from some distribution P that we have no access to.

Introduction to Machine Learning

In this section, the speaker introduces machine learning and explains how data is stored on a computer.

What is Machine Learning?

Machine learning involves training computers to learn from data.

The goal of machine learning is to develop algorithms that can make predictions or decisions based on input data.

How Data is Stored on a Computer

Data is stored as a long file of numbers in a vector of length T.

X represents the data sample, such as an email or MRI scan.

D represents the feature space, which refers to the number of dimensions in which the data is represented.

Types of Classification

In this section, the speaker discusses different types of classification used in machine learning.

Binary Classification

Binary classification involves classifying samples into one of two categories.

Examples include email spam filtering and face detection.

Multi-Class Classification

Multi-class classification involves classifying samples into one of K different categories.

Examples include classifying articles on the web into categories such as sports, politics, and gadgets.

Regression

Regression involves predicting a continuous value for Y.

Examples include predicting house prices based on features such as square footage and proximity to schools.

Q&A Session

In this section, the speaker answers questions from the audience about machine learning and classification.

Questions About Classification

The audience asks questions about binary classification, multi-class classification, and regression.

The speaker provides examples for each type of classification.

Chalk Request

The speaker requests larger chalk for the next lecture.

Predictive Modeling in Healthcare

In this section, the speaker discusses how predictive modeling is used in healthcare to predict if a patient will return to the hospital within six weeks of being discharged.

Predicting Patient Readmission

Hospitals are incentivized to not discharge patients who may return quickly.

Penalties are imposed on hospitals if patients return within six weeks of discharge.

Binary features such as gender and age can be used to predict readmission.

Text documents can also be represented as vectors using a bag-of-words approach.

Bag-of-Words Representation

The bag-of-words approach represents text documents as vectors of word counts.

This representation works well for identifying spam emails or categorizing news articles based on topic.

Word order is not considered, only the occurrence of words in the document.

Overall, predictive modeling is an important tool in healthcare for predicting patient outcomes and reducing costs. The bag-of-words approach is a useful technique for representing text data as vectors.