Operating Systems Lecture 19: File system implementation
Understanding File Systems
What is a File System?
- A file system organizes files and directories on disk, managing how data is stored and accessed.
- Disks consist of blocks (typically 512 bytes), which the file system uses to map operations like reading or writing files.
Key Components of File Systems
- Operating systems implement one or more file systems, differing in details but sharing core concepts.
- Data structures are crucial for organizing both file data and metadata, which includes information such as file size and access times.
Understanding Metadata
- Metadata refers to information about the data itself, not part of the actual content but essential for management.
System Calls in File Systems
- System calls are critical for manipulating data structures when performing operations like opening or reading files.
Components of a Simple File System
Structure of a Simple File System
- A simple file system example consists of a disk with 64 blocks organized into various regions: data region, I node blocks, bit maps, and super block.
Data Blocks
- The largest portion of any file system is dedicated to storing actual file data within these data blocks.
I Nodes
- Each file's metadata is stored in an I node containing information such as permissions and last access time.
Bit Maps
- Bit maps track free versus occupied blocks for both data and I nodes, facilitating efficient space management.
Super Block
- The super block contains an overview of the entire filesystem layout, detailing where different types of blocks are located.
Detailed Look at I Nodes
Organization of I Nodes
- The initial part of the filesystem includes multiple components: super block, bit maps for I nodes/data blocks, and the I node table itself.
Unique Identification
- Each file has a unique I node number that allows indexing into its corresponding metadata.
Information Stored in an I Node
- An I node stores vital metadata including permissions, last read time, size, and pointers to where the actual data resides on disk.
Pointers in Multi-Level Indexing
Managing Large Files with Pointers
- For large files requiring many blocks, an I node employs multi-level indexing to manage pointers efficiently.
Direct vs. Indirect Pointers
Understanding Indirect Blocks and File Allocation
Indirect Blocks in File Systems
- An indirect block is a structure where an I node stores the block number of another block, which can further point to additional blocks containing file data. This allows for multiple levels of indirection (single, double, triple) to accommodate large file sizes.
Multi-Level Index vs. File Allocation Table
- A multi-level index is a common structure for I nodes in modern operating systems, allowing efficient tracking of file blocks. Alternatively, a File Allocation Table (FAT) was used in older Windows systems to manage file blocks with one entry per disk block.
- In FAT, only the first block's number is stored; subsequent blocks are tracked through pointers until the last block points to null. This method simplifies tracking but may be less efficient than multi-level indexing.
Directory Structures and Their Functions
Directory as a Mapping Structure
- A directory serves as a mapping from file names to their corresponding I node numbers. It typically includes records that store both the I node number and the filename along with its length.
- Directories can be structured using various data structures such as linked lists, hash tables, or binary search trees depending on the filesystem design choices.
Special Characteristics of Directories
- A directory itself is treated as a special type of file with its own I node that tracks all data blocks containing records of filenames and their associated I node numbers.
Tracking Free Blocks in Filesystems
Methods for Tracking Free Blocks
- To manage free space within filesystems, methods like bitmaps or free lists are employed. A bitmap uses one bit per block: '1' indicates occupied while '0' signifies free.
- An alternative approach involves maintaining a linked list where each free block contains pointers to subsequent free blocks, allowing efficient management of available space.
File Operations: Opening and Accessing Files
Purpose of Opening Files
- The open system call's primary function is to load an I node into memory for quick access during read/write operations. This ensures that necessary metadata about the file is readily available when needed.
Understanding the Open File Process
The Goal of Opening a File
- The primary objective during the open operation is to locate the I node of a file by traversing its path name.
- Starting from the root I node, you fetch its data block to find subsequent components in the path, such as directories and files.
Traversing Path Names
- The process involves recursively walking down from parent I nodes to child data blocks until reaching the final file name.
- If creating a new file, allocate a fresh I node and data blocks, storing this mapping in the directory entry.
Open File Table Structures
- The kernel maintains an open file table that tracks all open files globally, including sockets and pipes.
- Each process has its own per-process open file table with entries pointing to the global table for easy access.
Importance of Global File Table
- A global file table allows multiple processes to reference shared files efficiently while maintaining individual process tables for convenience.
- By default, three standard files (input/output/error) are opened for each process at startup, with additional files assigned higher descriptor numbers.
Accessing Files via Descriptors
- Once a file descriptor is obtained through an open call, it provides direct access to the corresponding I node of an open file.
- Operations on files typically involve accessing various blocks (data blocks, metadata), necessitating multiple disk accesses due to their complexity.
Virtual File System Optimization
Virtual File System and Disk Buffer Cache
Understanding the Virtual File System (VFS)
- The VFS provides a higher layer of abstraction for system calls, viewing file systems as a collection of objects such as files, directories, and super blocks.
- Operations on these objects include looking up file names and reading or writing data blocks. System call logic in Linux is written in terms of these VFS objects.
- Implementing a new file system requires creating data structures for VFS objects and defining operations like lookups without rewriting the entire system call logic.
- This design allows modern operating systems to manage complexity by reusing existing system call logic while providing specific implementations for different file systems.
- When system calls require operations on directories, they invoke corresponding functions in the underlying file system through pointers provided by the VFS.
Disk Buffer Cache Mechanism
- The disk buffer cache stores recently accessed disk blocks to minimize disk I/O operations, improving performance significantly.
- On a cache miss (when requested data isn't found), the OS retrieves it from the disk, caches it, and returns it to the requesting process.
- Write operations also utilize caching; changes are first made in memory before being pushed to disk later, enhancing efficiency.
Types of Buffer Cache Implementations
- There are two main types of buffer cache: synchronous (write-through), which immediately writes changes to disk, and asynchronous (write-back), which delays writing until later.
- The file system interacts with the buffer cache without needing to understand its complexities; it simply requests reads or writes.
Memory Allocation for Buffer Cache
- Memory allocation for the buffer cache is typically dynamic, utilizing free memory not used by other processes to enhance performance.
- Benefits include improved performance due to reduced disk access when multiple processes request the same block from memory instead.
Advantages and Limitations of Buffer Caching
- Important blocks frequently accessed—like kernel code or hot files—are often cached, reducing overall disk access time significantly.