UNIT-1 PYTHON PROGRAMMING-II || ARTIFICIAL INTELLIGENCE || CLASS-12 AI 843 || CBSE 2025-26
Introduction to Python Programming for Class 12 AI Students
Overview of the Unit
- This video is aimed at Class 12 AI students, focusing on the first unit: Python programming. The unit will be evaluated through practicals rather than theory exams.
- The content builds upon knowledge from Class 11, making it easier for students as they have already covered relevant libraries like Pandas and NumPy.
Libraries in Python
- The discussion highlights that pre-written codes in libraries simplify tasks such as creating bar charts using Matplotlib, which allows users to pass parameters easily.
- Statistics module usage was mentioned, where functions like mean can be applied without needing to learn complex formulas; data is passed as arguments to get results directly.
Understanding Data Manipulation with Pandas and NumPy
Features of Pandas and NumPy
- These libraries are essential for data manipulation, allowing storage of data in arrays (NumPy) and creation of DataFrames (Pandas) for organizing data efficiently.
- Functions within these libraries enable analysis by accessing individual rows and columns, checking dimensions, and identifying missing values in datasets.
Importance in Data Science
- The use of Pandas and NumPy is crucial in data science due to their ability to handle large datasets effectively, including importing/exporting CSV files and performing various analyses.
- A question may arise regarding the full form of NumPy—Numerical Python—which emphasizes its role in numerical computing applications. Understanding this can be beneficial during viva sessions.
Working with Arrays in NumPy
Array Operations
- NumPy supports operations on arrays such as addition or subtraction between different arrays containing student marks across subjects (e.g., Math and AI). This facilitates easy calculations for total scores.
- It provides an n-dimensional array structure that stores values efficiently; understanding how these structures work is fundamental for effective programming practices.
Types of Arrays
- Arrays can be one-dimensional (1D), two-dimensional (2D), or even higher dimensions; each type has specific indexing methods that allow access to elements based on their position within the structure. For example, a 1D array starts indexing from zero.
- Homogeneous nature of arrays means all elements must be of the same data type; mixing types will result in errors when attempting to store them together within a single array structure.
Creating One-Dimensional Arrays
Implementation Details
- To create a one-dimensional array, you define it using a linear structure that holds a sequence of elements all sharing the same data type accessed via a single index starting from zero (e.g.,
a = [1, 2, 3]).
- Accessing elements involves referencing their index position; printing an element at index zero would display its value directly (e.g., printing
ayields1).
Understanding Array Objects in NumPy
Introduction to Array Objects
- The array object in NumPy is referred to as an "ndarray," which can have multiple dimensions.
- The number of dimensions of the array is called the "rank" of the array; for example, a three-dimensional array has a rank of three.
Creating Arrays with Rank One
- To create a one-dimensional array, the NumPy library must be imported using
import numpy as np.
- The keyword
importis used to include any library, such as pandas or NumPy, into your code.
- An array object can be created by assigning it to a variable and using the
arrayfunction with a list argument.
Displaying Array Values
- After creating an array named
arr, its values (e.g., 1, 2, 3, 4, 5) can be printed along with a message indicating its rank.
- A message can be displayed alongside the output by using double quotes for strings and separating them from variables with commas.
Creating Two-Dimensional Arrays
Methodology for Creating 2D Arrays
- Two lists are utilized to create a two-dimensional array; for instance,
[1, 2, 3]and[4, 5, 6].
- This results in a matrix structure where data is organized into rows and columns.
Understanding Matrices
- A matrix is defined as a two-dimensional array that organizes elements effectively into grid-like structures.
- Essentially, it consists of arrays within arrays (an "array of arrays"), allowing complex data organization.
Using Tuples in Array Creation
Characteristics of Tuples
- Unlike lists that allow updates and appending items, tuples are immutable once created.
- Elements within tuples can still be accessed via index values but cannot be modified or extended after creation.
Introduction to Pandas Library
Overview of Pandas Functionality
- The full form of pandas is "Panel Data," which refers to storing observations across different entities.
- Pandas facilitates loading datasets and displaying summary statistics while enabling group-wise analysis for performance evaluation.
Data Structures in Pandas
- Pandas primarily provides two data structures: Series (one-dimensional labeled arrays capable of holding various data types), and DataFrame (two-dimensional labeled data structure).
Creating Series and DataFrames in Pandas
Introduction to Series
- The concept of creating a series in Python using indices labeled as 0, 1, 2, 3. A series can be created from scalar values or data items.
- To create a series, the
seriesfunction from the Pandas library is used similarly to how arrays were created with NumPy.
Using the Series Function
- A variable named
my_varis introduced as a series object by passing a list (e.g.,[1, 7, 2]) into thepd.Series()function.
- The output of printing
my_varshows both the values and their corresponding indices (0, 1, 2), demonstrating how elements can be accessed via these indices.
Limitations of Series
- It is noted that a series can only store one-dimensional data. For example, if we want to store marks for multiple subjects for students, this cannot be done effectively with a series.
- Instead of using a series for multi-dimensional data storage (like multiple subjects), it’s recommended to use DataFrames in Pandas.
Understanding DataFrames
- A DataFrame is described as a two-dimensional data structure that organizes data in rows and columns similar to matrices or tables.
- The creation of DataFrames will involve methods such as creating them from NumPy arrays or dictionaries containing lists.
Methods for Creating DataFrames
- Two methods are discussed:
- First method involves creating a DataFrame using NumPy arrays.
- Second method involves creating it from dictionaries where keys become column names and values become column entries.
Important Considerations When Creating DataFrames
- When using NumPy arrays to create DataFrames:
- Each array corresponds to a row; thus understanding this mapping is crucial when structuring your data correctly.
Example Creation Process
- An example illustrates how three rows and four columns can be structured through an array setup.
- In dictionary-based creation:
- Keys represent column names while their associated lists represent the values under those columns.
Common Mistakes in Structuring DataFrames
- Emphasis on avoiding confusion between rows and columns when defining structures; incorrect assignments could lead to errors during execution.
Final Steps in Creating DataFrame Objects
- The process concludes with ensuring proper imports (e.g., importing pandas as pd).
- After setting up arrays/lists correctly within the
DataFrame()method call, users can define column headers if necessary.
DataFrame Creation Techniques in Pandas
Using the index Attribute
- The
indexattribute of a DataFrame is essential for defining row labels, while column headers can be assigned using thecolumnsattribute.
- When creating an array, if you need to define index values explicitly, you can use the
indexattribute to set them accordingly.
Assigning Column Headers
- Column headers can be assigned as lists or tuples. This flexibility allows for various data structures when defining DataFrame columns.
- Both lists and tuples can also be used for index values, providing versatility in how data is structured within a DataFrame.
Creating DataFrames from Dictionaries
- A common method for creating a DataFrame involves passing a dictionary of lists or arrays. Each key-value pair represents column names and their corresponding data.
- The syntax requires using curly braces `` to denote key-value pairs, where keys serve as column names and values are the associated data.
Handling Missing Values
- When constructing a DataFrame from dictionaries, missing values will appear as NaN (Not a Number). This behavior is consistent across different methods of creation.
- If no row labels are specified during creation, default indexing (0, 1, 2...) will apply unless defined otherwise with the
indexparameter.
Creating DataFrames from Lists of Dictionaries
- Another approach involves creating a DataFrame from a list of dictionaries. Each dictionary corresponds to one row in the resulting DataFrame.
- You can assign series directly within this structure; each series acts like an individual column with its own index.
Adding New Columns to Existing DataFrames
- To add new columns to an existing DataFrame, reference the DataFrame name followed by square brackets containing the new column name and assign it values directly.
- For example, adding a new column named "Fatima" would involve specifying its name in square brackets and assigning it appropriate values.
This structured overview captures essential techniques for working with Pandas' DataFrames based on the provided transcript.
Data Manipulation in Python: Inserting and Deleting Rows and Columns
Inserting New Rows into a DataFrame
- To add a new row to a DataFrame, use the syntax
dataframe_name.loc[]where you specify the index for the new row.
- If there are incorrect values (e.g., marks), you can modify them by accessing the specific row and column using
dataframe_name.at[].
Deleting Rows and Columns from a DataFrame
- The method
dataframe_name.drop()is used to delete rows or columns. The parameterx = 0indicates that rows will be deleted, whilex = 1indicates columns.
- For deleting multiple rows, list their indices within square brackets and set
x = 0. This allows for efficient removal of several rows at once.
Accessing DataFrame Properties
- Key properties of a DataFrame include its index, which displays row labels. Use
dataframe.indexto access these labels.
- To get column names, use
dataframe.columns, while the shape of the DataFrame (number of rows and columns) can be accessed withdataframe.shape.
Displaying Portions of a DataFrame
- The functions
.head(n)and.tail(n)display the first or last 'n' rows respectively. By default, they show five rows if no argument is provided.
Understanding CSV Files
- CSV stands for Comma-Separated Values; it is commonly used for storing tabular data in plain text format where each line represents a row.
- CSV files are easy to read/write for both humans and computers, making them ideal for data storage. They allow straightforward import/export operations between applications like Notepad or Excel.
Importance of CSV Files in Data Analysis
- CSV files play a crucial role in data analysis as they facilitate changes and manipulations on datasets stored within them.
- Using libraries like Pandas allows users to load data from CSV files into DataFrames efficiently, enabling complex operations on structured data.
This markdown file summarizes key concepts related to manipulating data within Python's Pandas library as discussed in the transcript. Each section provides insights into practical applications such as inserting/deleting data, accessing properties, displaying portions of datasets, understanding CSV formats, and their significance in data analysis.
Creating and Handling CSV Files in Google APIs
Steps to Create a CSV File
- Begin by creating a student CSV file using Google APIs. Input your data into an Excel sheet, then save it as a CSV file.
- Navigate to the "File" menu, select "Save As," and choose the comma-delimited format for saving. Name the file "student_mod.csv" for consistency with future tasks.
Uploading and Accessing the CSV File
- Open Google APIs and click on the folder icon to upload your newly created CSV file. Ensure that it appears in your list of files after successful upload.
- Use the provided code from your reference book (import pandas as pd) to read data from the uploaded CSV file into a DataFrame.
Exporting Data from DataFrame
- After reading data into a DataFrame (stored in 'df'), you can export this data back into another CSV file named "result_add.csv" without including index values.
Handling Missing Values
Identifying Missing Values
- Discusses how to handle missing values within a DataFrame. If any value is missing due to various reasons, functions are available to check for these gaps.
- To eliminate rows with missing values, use specific features that allow dropping such rows from your dataset.
Strategies for Managing Missing Values
- You can either drop rows containing missing values or estimate them based on other available data points.
- The
isnull()function checks if there are any missing values in your DataFrame, returning True or False accordingly.
Dropping Rows with Missing Values
- Utilize
dropna()function which removes all rows containing any NaN (missing value), resulting in a cleaner dataset suitable for analysis.
Estimating Missing Values
- The
fillna()function allows you to replace NaN values with specified estimates or averages derived from surrounding data points.
Practical Application of Functions
Checking Specific Columns for Missing Values
- You can check specific columns for missing values by referencing column names followed by
.isnull().any(), which indicates whether any NaN exists in that column.
Conclusion on Handling Missing Values
- Understanding how to effectively manage missing values is crucial; functions like
dropna()andfillna()are essential tools in ensuring robust data analysis processes.
Data Analysis Techniques for Missing Values
Understanding Missing Values in Data Frames
- The speaker discusses the use of functions to identify missing values within a specific row of a data frame, emphasizing the importance of understanding data completeness.
- A method is introduced to calculate the total number of missing values across an entire data frame using summation functions, highlighting its utility in data analysis.
- The explanation includes practical steps on how to implement these functions effectively, ensuring clarity for viewers who may be new to data manipulation techniques.
- The speaker expresses hope that viewers will find value in the video content, indicating a focus on educational outcomes and viewer engagement.