Introduction to Numpy and data visualization by Ravi Bhandari

Introduction to Numpy and data visualization by Ravi Bhandari

Introduction to Geoprocessing with Python

Overview of Previous Session

  • The session begins with a welcome and a brief recap of the previous discussion on basic constructs of Python programming.
  • Key topics covered included variable creation, manipulation, and different data structures such as lists, dictionaries, and sets.
  • Emphasis is placed on the fact that understanding all aspects of Python is not necessary for geospatial processing; it is a subset of the language.

Introduction to Fundamental Libraries

  • Two essential libraries in Python for data science applications are introduced: NumPy and Matplotlib.
  • NumPy (Numerical Python) is highlighted as crucial for handling multi-dimensional arrays and matrices.
  • Matplotlib is mentioned as a library primarily used for data visualization.

Deep Dive into NumPy

Importance of NumPy

  • NumPy is described as an open-source library vital across various fields in science and engineering when using Python.
  • It serves as the core foundation for many scientific libraries in Python, including Pandas, SciPy, Scikit-learn, and others.

Features of NumPy

  • The primary class in NumPy is ndarray, which allows efficient storage and manipulation of n-dimensional arrays.
  • Mathematical operations can be performed easily on these arrays without complex loops typical in other programming languages.

Understanding Array Dimensions

Types of Arrays

  • A one-dimensional array holds multiple values indexed by a single index; two-dimensional arrays are organized in rows and columns requiring two indices to access elements.
  • Stacking multiple two-dimensional arrays creates three-dimensional arrays; this concept extends indefinitely to higher dimensions (fourth dimension, fifth dimension, etc.).

Accessing Elements

  • Accessing elements from one-dimensional arrays requires only one index while two-dimensional arrays require both row and column indices.
  • This indexing system continues similarly for higher dimensional arrays where more indices are needed to uniquely identify an element.

Understanding Numpy Arrays and Their Usage

Introduction to Numpy

  • A three-dimensional array requires three indexes: stack, row number, and column number. This concept extends to various data structures.
  • To use Numpy on a local system, it must be installed via package managers like Conda or pip. Google Colab has Numpy pre-installed.

Importing Numpy

  • After installation, libraries need to be imported into the Python interpreter for use. The standard convention is import numpy as np.
  • Using np as shorthand simplifies code by reducing typing when accessing functions within the library.

Creating Arrays

  • To create a one-dimensional array, you can use np.arange(), which generates a sequence of numbers (e.g., np.arange(6) creates an array with six values).
  • Numpy arrays are specialized lists designed for handling numerical data efficiently compared to standard Python lists.

Differences Between Numpy Arrays and Lists

  • Numpy arrays provide faster and more efficient methods for creating and manipulating numerical data than generic Python lists.
  • While Python lists are versatile, they are slower due to their general-purpose nature; in contrast, Numpy arrays are optimized for numerical operations.

Limitations of Numpy Arrays

  • Appending elements to a numpy array is slower compared to lists; once created, changing the size of an array is generally not recommended.
  • Despite some limitations in resizing, numpy arrays excel in performance for numerical computations.

Understanding Array Structure

  • An array serves as the central data structure in the numpy library (ndarray), representing grids of values across multiple dimensions.
  • The rank of an array refers to its number of dimensions; this should not be confused with matrix rank.

Array Shape and Creation

  • The shape of an array indicates its size along each dimension using a tuple of integers.
  • One-dimensional arrays (vectors), which we will explore further, can be created easily using functions like np.arange() or similar methods.

Creating and Manipulating Numpy Arrays

Introduction to Numpy Arrays

  • To create a one-dimensional array (vector), use the function np.array() with a list of values. For example, np.array([1, 2, 3]) creates a numpy array.
  • A single-dimensional array allows access to elements using a single index, such as item one or item two.

Characteristics of Numpy Arrays

  • Unlike Python lists, numpy arrays cannot grow dynamically; they are slower when it comes to extending their size.
  • It is common practice to maintain a Python list for appending items and then convert that list into a numpy array for efficiency.

Creating Specialized Arrays

  • Use np.zeros() to create an array initialized with zeros. You can specify the data type (e.g., integer).
  • If you want an array similar in shape and size to an existing one but filled with different values, functions like np.zeros(), np.ones(), and np.full() can be used.

Generating Sequences

  • The function np.arange() generates sequences of numbers. It can take parameters for start, stop, and step size.
  • Alternatively, use np.linspace() to generate evenly spaced numbers over a specified range without manually calculating the delta between numbers.

Handling Floating Point Numbers

  • When generating sequences with floating-point numbers using arange(), it may not yield expected results due to precision issues. Instead, prefer using linspace() for better control over the output range.

This structured overview captures key concepts related to creating and manipulating numpy arrays based on the provided transcript while ensuring clarity and accessibility through timestamps.

Generating Random Numbers and Numpy Arrays

Generating Random Numbers

  • The process of generating random numbers from a uniform distribution between 0 and 10 is discussed, emphasizing that each number has an equal probability of being selected.
  • To generate numbers uniformly in the range of 0 to 1, one can use a function without specifying starting and ending points, which will yield results within this range.
  • For generating numbers from a normal distribution, functions like random.normal are utilized where parameters such as mean and standard deviation (sigma) are specified.
  • The numpy library offers advanced functions for random number generation through its RNG module, providing better control compared to the basic random library.

Working with Numpy Arrays

  • Once a numpy array is created, it’s essential to extract values for manipulation or information retrieval. Accessing elements by their index is straightforward in numpy arrays.
  • Each element in a one-dimensional array has an index starting from zero. For example, accessing the second element requires using index one.
  • To extract multiple consecutive elements from an array, slicing can be done using start and end indices; note that the end index is exclusive.

Indexing Techniques

  • Negative indexing allows access to elements from the end of the array. For instance, -1 refers to the last element while -2 refers to the second last.
  • Slicing with negative indices can also be performed; for example, specifying a[-2:] retrieves elements starting from two positions before the end up to the last element.

Advanced Array Manipulation

  • To select alternate elements or specific non-contiguous elements (like first, fifth), lists of indices can be provided within square brackets when accessing array elements.
  • Modifying existing values in an array is possible by directly assigning new values at specified indices. Boolean indexing allows filtering based on conditions (e.g., selecting values greater than five).

Boolean Indexing and Functions

  • Using boolean indexing simplifies data extraction; for instance, creating a binary mask where true indicates values greater than five eliminates loops typically required for such checks.
  • Functions like np.any() allow checking if any element meets certain criteria without iterating through each item manually. This enhances efficiency in data analysis tasks.

Numpy Array Manipulation Techniques

Boolean Indexing and Conditional Assignment

  • The np.all function checks if all elements in an array are greater than five, returning false if any element does not meet the condition.
  • A boolean array created by comparing values (e.g., a > 5) can be used to index another array, selecting only the elements that correspond to true values.
  • You can assign new values based on conditions; for example, setting all elements greater than three but less than or equal to five to zero is straightforward with conditional statements.
  • The np.where function returns indices of elements that satisfy a condition (e.g., where values are greater than five).
  • You can perform binary thresholding by replacing all values below a certain threshold with zero and those above it with one using simple assignment.

Vector Operations and Broadcasting

  • Numpy allows for efficient manipulation of arrays through vector operations without explicit loops; operations like addition or multiplication can be performed directly between arrays of the same size.
  • When adding a scalar value to an array, Numpy automatically broadcasts the scalar across all elements, simplifying operations without needing to create additional arrays.
  • Broadcasting enables you to add a single value (like three) to every element in an array seamlessly, enhancing code efficiency and readability.
  • Various mathematical functions such as squaring (**2), square roots, and exponentiation are readily available for element-wise operations on arrays.
  • For more complex calculations like dot products or cross products of vectors, Numpy provides specific functions (np.dot, np.cross) that streamline these processes.

Working with Two-Dimensional Arrays

  • Transitioning from one-dimensional to two-dimensional arrays involves organizing data into rows and columns, which is referred to as creating matrices in Numpy.
  • To create a two-dimensional array using Numpy's np.array, each row must be separated by commas while maintaining the structure similar to one-dimensional arrays.

Understanding Two-Dimensional and Three-Dimensional Arrays in NumPy

Creating Two-Dimensional Arrays

  • To create a two-dimensional array, stack multiple one-dimensional arrays using double brackets. Each row is separated by a comma, and each row must be defined as a one-dimensional array.
  • The shape of the created array indicates the number of elements along each axis: rows and columns. For example, an array with 2 rows and 3 columns has a shape of (2, 3).

Indexing in Two-Dimensional Arrays

  • Accessing elements in a two-dimensional array requires two indexes: the row number and the column number. For instance, accessing the first row and second column retrieves the seventh element.
  • You can specify all columns for a particular row using syntax like a[1, :], which selects all columns from row one.

Performing Operations on Arrays

  • Functions such as sum can take an axis argument to perform operations like summation across rows or columns. Specifying axis=0 sums column-wise while axis=1 sums row-wise.
  • Element-wise operations are straightforward; for example, adding two matrices of the same shape results in element-wise addition without needing loops.

Broadcasting in NumPy

  • Broadcasting allows operations between arrays of different shapes. For instance, dividing every element of a matrix by 9 will apply the operation across all elements due to broadcasting rules.

Transitioning to Three-Dimensional Arrays

  • A three-dimensional array is formed by stacking multiple two-dimensional arrays together. The shape reflects how many two-dimensional arrays are included along with their respective rows and columns.
  • Accessing elements within a three-dimensional array requires three indexes: which two-dimensional array to access, followed by the specific row and column numbers.

This structured overview provides insights into creating and manipulating both two-dimensional and three-dimensional arrays using NumPy, highlighting key concepts such as indexing, operations, broadcasting, and dimensionality.

Understanding Image Representation and Plotting in Python

Image Representation with Numpy Arrays

  • Images can be represented as a single two-dimensional array if they have one band. For RGB images or those with multiple bands, they are represented using a three-dimensional array, where each dimension corresponds to a color channel (e.g., red, green, blue).
  • Accessing elements in a three-dimensional array requires three indices: the first for the x-axis (which 2D array to pick), followed by row and column numbers.
  • The manipulation of images will primarily utilize Numpy arrays throughout the course.

Introduction to Matplotlib

  • Matplotlib is a Python plotting library used for creating publication-quality figures. A basic understanding of Numpy arrays is essential since Matplotlib operates on them.
  • In Matplotlib, the entire drawing area is referred to as a "figure," which can be divided into smaller sections called axes. Each axis contains an x-axis and y-axis for plotting data.

Creating Figures in Matplotlib

  • There are two methods for creating figures:
  • Explicitly creating figures and adding axes manually.
  • Allowing Matplotlib to automatically create axes when calling plot functions.
  • Both methods have their advantages and disadvantages; users should refer to documentation to choose based on their needs.

Basic Plotting Techniques

  • To use Matplotlib, it must first be imported (commonly as plt). Users typically create one-dimensional Numpy arrays representing data points (e.g., angles).
  • By calling plt.subplots, users can explicitly create figures and axes before plotting data from one-dimensional arrays like x and y coordinates.

Implicit vs. Explicit Plotting

  • If users prefer not to create figures explicitly, they can directly call plt.plot, which automatically generates necessary components without manual setup.
  • Users can also combine multiple series in plots easily or arrange subplots vertically or horizontally based on convenience.

Visualizing Two-Dimensional Data

  • Since images are essentially matrices of cells, plain plot functions cannot represent two-dimensional arrays effectively. Instead, Numpy provides the imshow function specifically designed for displaying images.
  • The session includes reading a simple JPEG image using Matplotlib; however, future sessions will utilize another library called Jidal for more complex image handling tasks.

Preparing for Future Sessions

  • While today’s focus is on visualizing images with limited formats supported by Matplotlib, future lessons will involve libraries capable of handling various image types that include additional metadata such as geographic coordinates.

How to Use Google Colab and Image Processing with NumPy

Mounting Google Drive in Google Colab

  • To mount Google Drive in Google Colab, click on the folder icon and select "Connect to Google Drive." This will prompt for permission, which needs to be granted.
  • Once connected, the entire Google Drive becomes accessible within the Colab environment, allowing users to navigate through their files.

Reading and Displaying Images

  • The function IM read is used for reading images. The shape of the image data is represented as a three-dimensional array (height, width, channels), indicating it is a colored image.
  • To display an image from a two-dimensional array, use the imshow function instead of plot, which is meant for one-dimensional data.
  • It’s emphasized that for plotting images or two-dimensional arrays on screen, imshow should be utilized.

Extracting Color Bands from Images

  • Users can extract individual color bands (e.g., red band) from a three-dimensional array using indexing techniques.
  • By setting the colormap parameter to 'gray', users can convert and display images in grayscale.

Overview of Key Concepts Discussed

  • A summary of discussions includes an introduction to NumPy as a library designed for handling n-dimensional numerical data efficiently.
  • The session covered creating vectors (one-dimensional arrays), various indexing methods, and extending these concepts to two and three-dimensional arrays.
  • An overview of Matplotlib was provided, highlighting two primary ways to create plots: using explicit calls like plt.plot.
Video description

IIRS-ISRO