Machine Learning on OCI for League of Legends- Data Extraction

Machine Learning on OCI for League of Legends- Data Extraction

Introduction

The speaker introduces the workshop and provides information on how to access the lab guide.

Workshop Introduction

  • Tara Vancleave introduces the workshop on machine learning on OCI for League of Legends data extraction.
  • She mentions that there is a special promotion with extra credits for those who do not have an Oracle Cloud account.
  • The session will be recorded and sent via email after the workshop concludes.
  • Questions can be asked in the Slack channel, and there is a dedicated space for Q&A using hashtag #labLeagueofLegends.
  • Ignacio Martinez and Victor Martin will present the workshop, with Wojak Pluta helping with Q&A.
  • The lab guide can be accessed through a link provided by Tara.

Setting up an Oracle Cloud Free Tier Account

Victor Martin walks through the steps to sign up for an Oracle Cloud free tier account.

Signing Up for an Oracle Cloud Free Tier Account

  • Victor Martin explains that signing up for an Oracle Cloud free tier account gives you access to always-free cloud services such as autonomous databases, virtual machines, load balancers, block object and archive storage monitoring, etc.
  • He mentions that you also get a 30-day trial with three credits to use in other services from analytics to Kubernetes.
  • If you are signing up for an upcoming workshop, make sure to use the same email used during registration as they have activated a special offer.
  • You need to add your country then your first and last name. You also have to provide your email. Oracle Cloud has to check if you are a human by asking you to resolve an easy challenge. Finally, click verify my email to move to the next step.
  • Confirm the password by typing it again and fill in the company name if applicable. The cloud account name is essential as it is a unique identifier for your account. Select a name that is not taken.
  • The home region is the region where you are more likely to deploy resources. You can deploy to other regions by subscribing to them; however, your free trial only supports one region. Select a region geographically close to you if you are not sure.
  • Fill in the address information with City postcode County and your phone number then accept the agreement and click Start my free trial.
  • Oracle Cloud starts provisioning your account which takes an average of one or two minutes. When fully provisioned, you will receive an email.

Conclusion

The workshop provides insights on how to leverage AI on OCI for League of Legends data extraction, while Victor Martin walks through signing up for an Oracle Cloud free tier account.

Oracle Cloud Account Setup

This section covers the setup process for an Oracle Cloud account.

Setting up an Oracle Cloud Account

  • Select a region when signing up.
  • Icons on the right side of the screen include Apex instances, Cloud cell, announcements, help, language change, and profile information.
  • Your tenancy and sign out button are located in the top right corner.
  • The message saying that your account is being set up will disappear after a while.
  • You will receive a getting started email and another one when your trial account is fully provisioned.

Introduction to Machine Learning and Gaming

This section introduces machine learning and gaming with a focus on League of Legends.

Introduction to League of Legends

  • Access the lab guide 0928 to get started.
  • A three-minute video explains all concepts related to League of Legends.
  • A free tier account is needed to go through this tutorial.

Machine Learning and Gaming

  • Nacho Martinez introduces himself as the speaker for this session.
  • The project focuses on developing something that helps players while they're playing League of Legends.
  • The video explains how League of Legends works.

General Concepts

This section provides an overview of the general concepts of League of Legends.

How to Win Games

  • The objective is to kill minions and enemy champions to earn gold, which can be used to buy items that increase your stats.
  • Winning is about gaining an advantage over the enemy team by killing more minions and champions.
  • Human errors and factors make the game complex, so filtering for the best players is necessary when working with data.

Creating a Machine Learning Model

This section discusses creating a machine learning model for predicting game outcomes.

Two Models

  • Two models will be created: one for helping during matches and another trained on historical data from professional players.
  • Region selection when signing up does not matter much. Phoenix in US West is recommended.
  • OCI elements such as Data Science Cloud Notebooks, Cloud Shell, OCI Compute, and Autonomous JSON Database will be used.

Data Sets

  • Offline and online data sets will be generated. A robust data set downloaded from Riot Games API will also be provided.

Conclusion

This section concludes the presentation on using machine learning to predict League of Legends game outcomes.

Infrastructure Deployment

  • Infrastructure deployment takes about 10 minutes after finishing this part of the presentation.

Introduction to the Lab

In this section, the instructor introduces the lab and explains that they will be using Terraform and Ansible to deploy cloud resources automatically. They also explain that they will be using a Cloud Shell instance to configure everything.

Setting up the Environment

  • The instructor explains that they will use Terraform to deploy cloud resources automatically and Ansible to configure the compute instance.
  • The first step is to open a Cloud Shell instance, which takes about 30 seconds. Once it's open, download the source code containing all necessary files.
  • Navigate to the dev directory where you'll find both Terraform and Ansible directories.

Configuring Terraform Variables

  • To populate a terraform variables file, you can either use your code editor or a command-line tool like nano.
  • Fill out the region, tenant OCID, compartment OCID (if applicable), Riot Games API key (if applicable), and SSH key in the terraform variables file.

Generating an SSH Key

  • Generate an SSH key with ssh-keygen -t RSA so you can connect to your compute instance.

Deploying with OCI

In this section, the speaker explains how to deploy a project using OCI.

Modifying and deploying the file

  • Modify the file with all variables.
  • Execute start.sh file to deploy everything automatically.
  • Deployment takes 7 to 10 minutes depending on traffic.

Extracting API key

  • Go to developer.riotgames.com and sign up for League of Legends.
  • Verify your email address to get an API key.
  • The API key is needed for deployment.

Getting Started with League of Legends API

In this section, the speaker explains how to get started with League of Legends API.

Registering for League of Legends

  • Register for League of Legends at developer.riotgames.com.
  • Verify your email address before getting an API key.

Getting the API Key

  • After signing in, you will see your API key.
  • Regenerate another one if it has already expired.

Deployed Infrastructure and Project Details

In this section, the speaker talks about deployed infrastructure and project details.

Deployed Infrastructure

  • A simple infrastructure has been deployed using OCI.
  • Compute instance is used to extract data and generate datasets.
  • Autonomous JSON database is used to store data safely.

Project Details

  • The source code is open-source.
  • You can download the whole games player base which is about 50 million players.

Introduction to Terraform and Data Mining

In this section, the speaker introduces the idea of generating a dataset using Terraform and working with it. They also mention that they will answer any questions in chat.

How Terraform Works

  • Terraform generates configuration files that tell you anything deployed in your account.
  • It makes a discovery of all the things deployed in Oracle cloud and saves a copy.
  • After creating this, it tells you what resources are needed to create a database password, wallet password for our database, VCN (Virtual Cloud Network), and all connectivity things required.
  • Once created, it provides a data science notebook session URL to access the environment.

SSH into Machine

  • The speaker recommends using cloud shell to connect to the machine via an SSH key.
  • The deployment process took about 9 minutes and 51 seconds. Ansible connected to the machine, installed Python, installed all wallets, uninstalled source code.
  • The speaker SSH's into the machine and shows that there is only one directory which is the source code of the repo.

Data Mining Library

  • With SRC file we can go to this Optimizer part which has a lot of data ways from which we can extract data.
  • We need to consider players who are really good at League of Legends as entry points for data mining.
  • API key from Riot games expires every 24 hours so if planning on running this for more than 24 hours just consider that after an hour after a day your code is gonna break because you don't have a valid API key.

Overview of League of Legends Ranking System

In this section, the speaker provides an overview of the ranking system in League of Legends and explains how players are ranked based on their performance.

Ranking System in League of Legends

  • The top 200 Challenger players in each region are considered the best players.
  • Grand Master players are ranked below Challenger tier.
  • Players below Masters tier are not considered the best.
  • The speaker will consider Challenger to Master players for their data set.

Extracting Player Data

In this section, the speaker explains how they extract player data from League of Legends using APIs.

Storing Player Data

  • Players with hundreds or thousands of games in a season are considered for their dataset.
  • The extracted player data is stored automatically in a non-relational database as JSON structures.

Sorting by Role

  • It is possible to sort player data by role, such as bottom lane, mid lane, top lane, jungle or support.
  • The machine learning model can be trained to learn from specific roles.

Extracting Match Data

In this section, the speaker explains how they extract match data from League of Legends using APIs.

Downloading Matches

  • They download matches for every player in their database using API requests.
  • They search for the latest thousand games for each player and iterate through all previous matches.

Downloading Match Details

  • They download match details using a command that retrieves a huge JSON file containing all events that happen during a game.
  • This includes X Y coordinates, kills, assists and items bought by each player.

Processing Data for Machine Learning

In this section, the speaker explains how they process extracted data into a pipeline for machine learning.

Converting Data

  • They convert the extracted data into a pipeline that can be used by machine learning.
  • The converted data is then used to train their machine learning model.

Introduction

The speaker introduces themselves and discusses the steps required to change directories in the infrastructure Cloud shell.

Diane's Instructions

  • Diane provides instructions on how to change directories in the infrastructure Cloud shell.
  • She advises users to follow each step carefully and ensure that they have downloaded the repository.
  • When asked if it is possible to complete the labs without an Oracle account, she confirms that it is possible by downloading the GitHub repository and working on a personal computer.

Data Ingestion

The speaker discusses data ingestion into the ajd instance and shows how much data can be ingested.

Amount of Data Ingested

  • The speaker mentions that after a couple hundred lines, the execution stops. However, if more players are added to the database, it would never stop.
  • He shows how much data is being inserted into the database and explains that all data from players and matches are stored in the cloud database.

Json Databases

The speaker explains why Json databases are popular and demonstrates what kind of information League of Legends provides about a player.

Advantages of Json Databases

  • The speaker highlights one advantage of Json databases over relational design - if a collection didn't exist before, it doesn't matter.

Information Provided by League of Legends

  • The speaker demonstrates what kind of information League of Legends provides about a player such as their identifier, rank, best rank in solo queue or Flex, hot streak status, inactive status, and number of wins in the season.

Connecting with Other Oracle Products

The speaker explains how the cloud database can be connected to other Oracle products.

Connecting with Other Oracle Products

  • The speaker explains that the cloud database can be connected to other Oracle products such as Oracle analytics cloud for visualizations over players.

Data Science Environment

The speaker discusses accessing the data science environment and setting up a python environment.

Accessing the Data Science Environment

  • The speaker demonstrates how to access the data science environment by going into the active project that has been deployed and opening the notebook session.

Setting Up Python Environment

  • The speaker explains that users need to download notebooks from the repository and datasets provided by them.

Getting Started with the Notebook

In this section, we learn how to get started with the notebook and install all necessary dependencies.

Downloading and Opening the Notebook

  • To get started, SSH into the compute node created for you.
  • Download the notebook called "Hall one offline analysis.ipymb" from inside the League of Legends Optimizer and Notebook folder.
  • Open the notebook in Jupyter.

Installing Dependencies

  • Install all necessary Python dependencies before running any code.
  • The required libraries include NumPy, Pandas, and JSON.
  • Use read_csv to read in two datasets: matchups.csv and 1v1.csv.
  • Wait for these large files to load.

Understanding Data Structure

  • Learn about the data structure that will be used throughout upcoming workshops.
  • The data includes information such as total gold, kills, assists, deaths, champion played, and lane played in (top lane, mid lane, bottom lane/jungle/support).
  • This data will be transformed using machine learning techniques.

Using Lab Four Without Completing Lab Two

In this section we learn that it is possible to go straight to lab four without completing lab two.

Skipping Ahead

  • If you have completed task four in lab two but not other tasks or labs in between then you can skip ahead to lab four.
  • No need to worry about missing out on important information.

Exploring Data Structure Further

In this section we explore further details about the data structure used throughout upcoming workshops.

Transforming Data Structure

  • Machine learning techniques will be applied to transform data structure into a simple format of "I won" or "I lost".
  • Each match has an identifier number and lists which champions were played.
  • If champion one won, there is a "1" in the win column. If champion two won, there is a "0" in the win column.

Understanding Accuracy

  • The machine learning model used in this workshop has an accuracy of around 83%.
  • However, the main goal of this workshop is to teach how to create a machine learning model from scratch and not to focus on accuracy alone.

Importing Dependencies

  • Import necessary dependencies for the project including NumPy, Pandas, and JSON.
  • Use read_csv to read in two datasets: matchups.csv and 1v1.csv.

Scaling Data

  • Learn about scaling data using techniques such as StandardScaler or MinMaxScaler.
  • These techniques are important for ensuring that all features have equal weight when training models.

Introduction to Machine Learning

In this section, the instructor introduces the basic concepts of machine learning and explains how null values can be difficult to deal with in most cases.

Basic Concepts of Machine Learning

  • The best machine learning model aims to teach you the basic concepts.
  • Decision trees are especially good for dealing with missing or null values.
  • Before anything, we need to decide what to do with null values through data manipulation or processing.

Data Preparation

  • Identifying columns like match ID have high cardinality and are useless in machine learning models.
  • Splitting data into training and testing sets is a technique used to divide your dataset into percentages for training and validating your model's accuracy.
  • Most people use an 80/20 split for training/testing respectively.

Variable Splitting

  • We need to tell the model which variable we want to predict. In this case, we want to predict if team one wins or not.
  • This model has about 51% accuracy due to the large number of rows (2.5 million).

Categorical Variables

In this section, the instructor explains what categorical variables are and why they are important in machine learning models.

Understanding Categorical Variables

  • A categorical variable is defined as a variable that is not a number.
  • Machine learning models learn from numbers, so it's important to convert categorical variables into numerical ones before using them in a model.

One-Hot Encoding

  • One-hot encoding is a technique used to convert categorical variables into numerical ones.
  • It creates a binary column for each category and assigns a 1 or 0 to indicate whether that category is present in the original data.

Label Encoding

  • Label encoding is another technique used to convert categorical variables into numerical ones.
  • It assigns a unique number to each category, but this can lead to issues with ordinality and should be used carefully.

Label Encoding and Scaling

In this section, the speaker explains label encoding and scaling. They describe how to use label encoding to assign a number to each unique string in a dataset, and how one-hot encoding can be used as an alternative. The speaker also explains the importance of scaling data to ensure consistency.

Label Encoding

  • Label encoding assigns a number to each unique string in a dataset.
  • This creates an identifying variable for each word that is encoded.
  • One-hot encoding is similar but uses binary representation instead of numbers.
  • If a new category is added after the encoder has been created, it will cause an error.

Scaling

  • Scaling removes variations in numerical data so that machine learning models do not favor certain values over others.
  • Standard scalar is one method of scaling that subtracts the average value from each data point and divides by the standard deviation.
  • All variables must be transformed into numbers before training the model.

Logistic Regression Model

In this section, the speaker discusses logistic regression models and how they are used in machine learning.

Logistic Regression

  • A logistic regression model is a polynomial function used for classification tasks.
  • The model predicts probabilities for each event based on input data.
  • The test dataset is used to validate the accuracy of the model.

Model Inference

In this section, the speaker discusses how to test a model by exporting and importing it again. They also explain how to create new data and pass it to the model for testing.

Testing the Model

  • To test a model, export it and import it again.
  • Create new data and pass it to the model for testing.
  • The computer doesn't understand the dataset, so we need to perform all of the encoding and scaling that was done before.

Encoding and Scaling

  • Reuse objects like label encoders to perform encoding of new values automatically.
  • Scale all numbers into perspective before passing them through the model.
  • If values are not scaled, predictions will be inaccurate.

Working with Time Series Data

In this section, the speaker talks about working with time series data in League of Legends matches. They discuss how performance early on in a match can affect its outcome.

Time Series Data

  • League of Legends matches are time series data because what happens early on affects later outcomes.
  • Additional calculations apart from probabilities can add robustness to models.
  • Consider five players against five players when making predictions.

Making Predictions

  • Make five different predictions for each one of the lanes just like we've done here.