อัลกอริทึม : 1.9 ทบทวนโครงสร้างข้อมูล

อัลกอริทึม : 1.9 ทบทวนโครงสร้างข้อมูล

Understanding Data Structures in Computational Problems

Overview of Data Processing

  • The course focuses on computational problems related to data processing, emphasizing the importance of well-designed algorithms that understand systematic and organized data storage.
  • A review of data storage concepts will be conducted, summarizing ideas and efficiencies associated with various types of data structures.

Types of Data Storage

  • Discussion includes different forms of data storage from a usage perspective: Collection, Set, List, Stack, Queue, and Priority Queue.
  • Details will also cover methods for creating these structures using arrays, linked lists, trees, and tables.

Defining Collections and Sets

  • A Collection is defined as an unordered structure that allows duplicate entries; it can be visualized as a container where items are simply added without regard to order.
  • In contrast, a Set is similar but does not allow duplicates; adding a duplicate item either replaces the old one or rejects the new entry.

Lists and Their Characteristics

  • A List maintains order among its elements; items are stored sequentially with possible duplicates. Each element has a specific position within the list.
  • Stacks operate on a Last In First Out (LIFO) principle where the last item added is the first to be removed. Operations include "Push" for adding items and "Pop" for removing them.

Queues: Order Matters

  • Queues follow a First In First Out (FIFO) approach where items are processed in the order they arrive. Adding an item is referred to as "enqueue," while removing it is called "dequeue."
  • There exists another type known as Priority Queue which organizes elements based on their importance; higher priority items are dequeued before lower ones regardless of their arrival time.

Creating Data Structures

  • Various methods exist for constructing these data structures: primarily through arrays or linked references (pointers), each offering flexibility in managing data but differing in memory efficiency.
  • Trees can also be utilized for organizing information hierarchically while hash tables combine array-like access with linked lists to enhance search efficiency.

Practical Examples: Arrays vs Linked Lists

  • An example illustrates creating a List using an array consisting of two components: the array itself holding values and another tracking how many values it contains.
  • When inserting into an array at specific indices, existing elements may need to shift positions to maintain order—this can slow performance if done frequently at the beginning of the list.

Efficiency Considerations

  • Deleting from an array's start requires shifting all subsequent elements down by one position which impacts performance negatively compared to linked lists where nodes can easily be rearranged without shifting others.

Understanding Linked Lists and Their Efficiency

Characteristics of Linked Lists

  • Linked lists differ from arrays in data storage, allowing for efficient addition and deletion of elements when the position is known.
  • They can have various structures, including bidirectional links that enhance programming efficiency by reducing special cases in code.

Programming with Linked Lists

  • The concept of circular linked lists is introduced, where the last node points back to the first node, although this structure may not always be practical due to traversal speed issues.
  • Efficient traversal is emphasized; both forward and backward navigation through a list can be optimized with proper linking.

Implementing Stacks and Queues with Arrays

Stack Implementation

  • Stacks can be created using arrays where elements are added or removed from the end, maintaining constant time complexity for these operations.
  • The size variable indicates both the number of elements and the next available position for new entries in the stack.

Queue Implementation

  • Similar to stacks, queues also utilize arrays but require tracking the front element's position for efficient enqueue (addition) and dequeue (removal).
  • Operations on queues maintain constant time complexity as well; adding an item involves calculating its position based on current size.

Circular Queues for Space Efficiency

Managing Queue Size

  • To optimize space usage in a queue implemented with an array, a circular approach allows wrapping around when reaching the end of the array.
  • This method ensures all slots are utilized effectively while keeping operations at constant time complexity.

Binary Search Trees: Structure and Searching

Basics of Binary Search Trees

  • A binary search tree organizes data such that left children are less than their parent node while right children are greater.

Searching Mechanism

  • Searching begins at the root; comparisons determine whether to traverse left or right until finding or concluding absence of a value.

Finding Extremes in BST

Binary Search Tree Operations

Understanding Binary Search Trees (BST)

  • The efficiency of a binary search tree (BST) is related to its height, which affects operations like insertion and searching.
  • When adding new data to the BST, it must not already exist in the tree; if it does not exist, it becomes a new leaf node.
  • The order of data insertion significantly impacts the height of the BST. For example, inserting numbers in ascending order can lead to a taller tree.

Insertion Process

  • Different sequences of inserting the same set of numbers can yield trees with varying heights. For instance, inserting 1, 2, 3 results in a shorter tree than inserting 1, 5, 3.

Deletion Scenarios

  • Deleting nodes from a BST can vary in complexity. Simple cases include deleting leaf nodes or nodes with one child.
  • To delete a node with two children (e.g., node 40), find its successor (the smallest value greater than it), replace it with that value, and then delete the successor.

Complexity and Performance

  • The time complexity for operations such as searching and deleting is proportional to the height of the tree. A balanced tree has better performance compared to an unbalanced one.
  • The minimum height of a balanced BST is log(N), while the maximum height could be N - 1 for completely unbalanced trees.

Average Case Analysis

  • Empirical tests show that when random data is inserted into a BST, its average depth tends to be around 1.39 log(N), indicating efficient search times even for large datasets.
  • In scenarios where searches fail (i.e., when elements are not found), expected depths increase slightly but still remain logarithmic relative to N.

Conclusion on Tree Height

Understanding AVL Trees and Hashing Techniques

Characteristics of AVL Trees

  • The operations of adding, deleting, or searching in an AVL tree maintain the height balance of the tree. This results in a structure that is generally shorter compared to other binary trees.
  • The height of an AVL tree can be significantly less than that of a binary search tree (BST), with heights being approximately 40% to 14% lower than the shortest BST for large datasets.

Data Sorting Using Binary Search Trees

  • To sort data using a binary search tree, one can create the tree from all data points and then perform an in-order traversal to retrieve sorted data.
  • An example illustrates sorting a dataset (e.g., 215, 36) by constructing a binary search tree and traversing it in order to yield sorted output.

Alternative Data Structures: Hash Tables

  • Hash tables store data using hash functions that convert input into index positions for storage. A well-designed hash function ensures efficient distribution across the table.
  • Challenges arise when two pieces of data map to the same index (collisions). Various strategies exist to handle collisions effectively.

Collision Resolution Strategies

Separate Chaining

  • In separate chaining, colliding items are stored together in a list at their respective index. For instance, if multiple values hash to the same index, they are linked within that position.

Linear Probing

  • Linear probing involves finding the next available slot sequentially when a collision occurs. If an item hashes to an occupied index, it checks subsequent indices until it finds an empty one.

Quadratic Probing

  • Quadratic probing uses a quadratic function to find new slots upon collision. It starts with checking adjacent indices and increases its step size quadratically until it finds an open slot.

Understanding Binary Heap and Priority Queue

Introduction to Binary Heap

  • The sequence of numbers (1, 4, 9, 16) illustrates the concept of perfect squares, leading to the next number being 5^2, which is 25. This example emphasizes the frequent collisions in data structures.
  • A hash table structure allows for rapid searching and deleting of data if managed properly. Expanding the table size periodically can enhance performance.

Priority Queue Basics

  • The priority queue can be constructed using various methods; however, a simple approach involves utilizing a binary heap stored in an array while visualizing it as a tree for easier understanding.
  • A binary heap is a balanced tree where each node has two children and is filled from left to right at every level except possibly the last one.

Storage and Calculation in Binary Heap

  • Data in a binary heap is stored compactly in an array. Each node's position can be calculated based on its index using specific formulas.
  • The left child of a node at index k is located at 2k + 1, while the right child is found at 2k + 2. Conversely, the parent node can be determined by (k - 1)/2.

Properties of Binary Heap

  • In a binary heap, parent nodes must have higher values than their children. This property ensures that the maximum value resides at the root (index 0).
  • The height of a complete binary tree correlates with log base 2 of the number of elements, making operations efficient even with large datasets.

Efficiency in Operations

  • Adding or removing elements from a binary heap operates efficiently due to its logarithmic height. For instance, inserting an element requires only about log N swaps.
  • When adding new data to maintain structural integrity, if an inserted value violates parent-child relationships (e.g., inserting 16 when its parent has a lower value), swapping occurs until order is restored.

Conclusion on Deletion Process

  • Removing the root element involves replacing it with another element but requires careful management to maintain structural properties without losing relationships among nodes.

Understanding Data Structures: Array Manipulation and Heaps

Deleting Elements from an Array

  • The simplest way to delete elements from an array is by removing the last element. However, in this case, we need to remove a specific element (15).
  • To achieve this, we can take the value of 15 out for use and replace it with the value of 1, then simply reduce the size of the data structure.
  • This reduction means that if our data size was originally 6 and we remove one element, it becomes 5. Thus, in terms of data interpretation, there are now only 5 slots available.

Maintaining Parent-Child Relationships

  • When moving values up in a tree structure after deletion, it may disrupt parent-child relationships. For instance, if '1' is not greater than its children '13' and '9', this relationship must be corrected.
  • A simple solution involves comparing the problematic parent node with its larger child nodes and swapping them as necessary until all relationships are correct.

Efficiency of Deletion Operations

  • The process of deleting the maximum value maintains efficiency because it operates within logarithmic time complexity (log n base 2), which is efficient for binary trees.

Introduction to Min-Heaps

  • In some scenarios, having smaller values is more critical than larger ones; thus, changing storage rules leads us to Min-Heaps where smaller values are prioritized.
  • A Min-Heap ensures that each parent node has a value less than its children. This structure allows easy access to minimum values.

Practical Application: The 15 Puzzle Game

  • An example illustrating these concepts is the "15 puzzle," consisting of 15 numbered tiles with one empty space used for sliding tiles into place.
  • The goal is to arrange tiles sequentially from 1 to 15 through strategic movements involving sliding adjacent tiles into the empty space.

Problem-Solving Approach: State Space Search

  • To solve such puzzles algorithmically requires designing an approach known as state space search without delving into specifics at this moment.

Exploring Movement Options

  • Each movement involves shifting the empty space up/down/left/right while generating new configurations or states for evaluation against desired outcomes.

Queue Utilization for State Management

  • To manage generated states effectively during problem-solving, a queue (Q) can be employed where new configurations are added and processed systematically.

Conclusion on Problem Solving Structure

Understanding the Java Method for Table Manipulation

Overview of the Method

  • The method in Java receives an original table and returns a modified version after processing it through a queue.
  • It begins by removing data from the queue to generate new tables by shifting empty spaces in four directions (up, down, left, right).
  • If a generated table is not null and matches the solution, it is returned; otherwise, it is added back into the queue for further processing.

Handling Duplicates

  • The process may create duplicate tables due to similar moves leading to identical configurations.
  • To prevent infinite loops caused by repeated states, a set is used to track previously generated tables.
  • Each time a new table is added to the queue, it must also be checked against this set to avoid duplicates.

Data Structure Choices

  • Various data structures can be employed for managing sets: arrays, binary search trees (BST), AVL trees, or hash tables.
  • Each structure has its advantages; for instance, hash tables are efficient for insertions and lookups but may not always be optimal depending on specific use cases.

Performance Testing

  • The speaker tested solving the 15-puzzle problem with three different initial configurations of varying difficulty levels.
  • Results showed that simpler configurations produced fewer unique tables (around 500), while more complex ones resulted in thousands of configurations (over 100,000).

Efficiency of Data Structures

  • Using different data structures significantly impacted performance; using binary search trees reduced processing time dramatically from half an hour to just seven seconds for complex cases.
  • The choice of data structure affects overall efficiency during algorithm execution; thus careful selection based on context is crucial.

Summary of Data Structure Performance

  • Lists created with arrays perform adequately based on data volume. Linked lists have variable operation times due to pointer adjustments.
  • Binary search trees provide logarithmic average time complexity for operations like insertion and deletion. AVL trees maintain consistent log(n).

Adjusting Performance in HCH Tables

Key Insights on HCH Table Operations

  • The performance of HCH tables can be adjusted during operation if users are not satisfied with the current efficiency.
  • It is important to note that HCH tables are not suitable for operations involving data ranking, as their functions disrupt the order of data.
  • When data is inserted into an HCH table, it gets distributed rather than maintaining a sequential arrangement.
Video description

การบรรยายวิชา การออกแบบและวิเคราะห์อัลกอริทึม โดย สมชาย ประสิทธิ์จูตระกูล http://www.cp.eng.chula.ac.th/~somchai/books