Operating Systems Lecture 19: File system implementation

Name: Operating Systems Lecture 19: File system implementation
Uploaded: 2018-04-13T06:07:31.000Z
Duration: 54 min 14 s
Description: Based on the book Operating Systems: Three Easy Pieces (http://pages.cs.wisc.edu/~remzi/OSTEP/) For more information please visit https://www.cse.iitb.ac.in/~mythili/os/

Understanding File Systems

What is a File System?

A file system organizes files and directories on disk, managing how data is stored and accessed.

Disks consist of blocks (typically 512 bytes), which the file system uses to map operations like reading or writing files.

Key Components of File Systems

Operating systems implement one or more file systems, differing in details but sharing core concepts.

Data structures are crucial for organizing both file data and metadata, which includes information such as file size and access times.

Understanding Metadata

Metadata refers to information about the data itself, not part of the actual content but essential for management.

System Calls in File Systems

System calls are critical for manipulating data structures when performing operations like opening or reading files.

Components of a Simple File System

Structure of a Simple File System

A simple file system example consists of a disk with 64 blocks organized into various regions: data region, I node blocks, bit maps, and super block.

Data Blocks

The largest portion of any file system is dedicated to storing actual file data within these data blocks.

I Nodes

Each file's metadata is stored in an I node containing information such as permissions and last access time.

Bit Maps

Bit maps track free versus occupied blocks for both data and I nodes, facilitating efficient space management.

Super Block

The super block contains an overview of the entire filesystem layout, detailing where different types of blocks are located.

Detailed Look at I Nodes

Organization of I Nodes

The initial part of the filesystem includes multiple components: super block, bit maps for I nodes/data blocks, and the I node table itself.

Unique Identification

Each file has a unique I node number that allows indexing into its corresponding metadata.

Information Stored in an I Node

An I node stores vital metadata including permissions, last read time, size, and pointers to where the actual data resides on disk.

Pointers in Multi-Level Indexing

Managing Large Files with Pointers

For large files requiring many blocks, an I node employs multi-level indexing to manage pointers efficiently.

Direct vs. Indirect Pointers

Understanding Indirect Blocks and File Allocation

Indirect Blocks in File Systems

An indirect block is a structure where an I node stores the block number of another block, which can further point to additional blocks containing file data. This allows for multiple levels of indirection (single, double, triple) to accommodate large file sizes.

Multi-Level Index vs. File Allocation Table

A multi-level index is a common structure for I nodes in modern operating systems, allowing efficient tracking of file blocks. Alternatively, a File Allocation Table (FAT) was used in older Windows systems to manage file blocks with one entry per disk block.

In FAT, only the first block's number is stored; subsequent blocks are tracked through pointers until the last block points to null. This method simplifies tracking but may be less efficient than multi-level indexing.

Directory Structures and Their Functions

Directory as a Mapping Structure

A directory serves as a mapping from file names to their corresponding I node numbers. It typically includes records that store both the I node number and the filename along with its length.

Directories can be structured using various data structures such as linked lists, hash tables, or binary search trees depending on the filesystem design choices.

Special Characteristics of Directories

A directory itself is treated as a special type of file with its own I node that tracks all data blocks containing records of filenames and their associated I node numbers.

Tracking Free Blocks in Filesystems

Methods for Tracking Free Blocks

To manage free space within filesystems, methods like bitmaps or free lists are employed. A bitmap uses one bit per block: '1' indicates occupied while '0' signifies free.

An alternative approach involves maintaining a linked list where each free block contains pointers to subsequent free blocks, allowing efficient management of available space.

File Operations: Opening and Accessing Files

Purpose of Opening Files

The open system call's primary function is to load an I node into memory for quick access during read/write operations. This ensures that necessary metadata about the file is readily available when needed.

Understanding the Open File Process

The Goal of Opening a File

The primary objective during the open operation is to locate the I node of a file by traversing its path name.

Starting from the root I node, you fetch its data block to find subsequent components in the path, such as directories and files.

Traversing Path Names

The process involves recursively walking down from parent I nodes to child data blocks until reaching the final file name.

If creating a new file, allocate a fresh I node and data blocks, storing this mapping in the directory entry.

Open File Table Structures

The kernel maintains an open file table that tracks all open files globally, including sockets and pipes.

Each process has its own per-process open file table with entries pointing to the global table for easy access.

Importance of Global File Table

A global file table allows multiple processes to reference shared files efficiently while maintaining individual process tables for convenience.

By default, three standard files (input/output/error) are opened for each process at startup, with additional files assigned higher descriptor numbers.

Accessing Files via Descriptors

Once a file descriptor is obtained through an open call, it provides direct access to the corresponding I node of an open file.

Operations on files typically involve accessing various blocks (data blocks, metadata), necessitating multiple disk accesses due to their complexity.

Virtual File System Optimization

Virtual File System and Disk Buffer Cache

Understanding the Virtual File System (VFS)

The VFS provides a higher layer of abstraction for system calls, viewing file systems as a collection of objects such as files, directories, and super blocks.

Operations on these objects include looking up file names and reading or writing data blocks. System call logic in Linux is written in terms of these VFS objects.

Implementing a new file system requires creating data structures for VFS objects and defining operations like lookups without rewriting the entire system call logic.

This design allows modern operating systems to manage complexity by reusing existing system call logic while providing specific implementations for different file systems.

When system calls require operations on directories, they invoke corresponding functions in the underlying file system through pointers provided by the VFS.

Disk Buffer Cache Mechanism

The disk buffer cache stores recently accessed disk blocks to minimize disk I/O operations, improving performance significantly.

On a cache miss (when requested data isn't found), the OS retrieves it from the disk, caches it, and returns it to the requesting process.

Write operations also utilize caching; changes are first made in memory before being pushed to disk later, enhancing efficiency.

Types of Buffer Cache Implementations

There are two main types of buffer cache: synchronous (write-through), which immediately writes changes to disk, and asynchronous (write-back), which delays writing until later.

The file system interacts with the buffer cache without needing to understand its complexities; it simply requests reads or writes.

Memory Allocation for Buffer Cache

Memory allocation for the buffer cache is typically dynamic, utilizing free memory not used by other processes to enhance performance.

Benefits include improved performance due to reduced disk access when multiple processes request the same block from memory instead.

Advantages and Limitations of Buffer Caching

Important blocks frequently accessed—like kernel code or hot files—are often cached, reducing overall disk access time significantly.