Online Book Reader

Home Category

Managing RAID on Linux - Derek Vadala [106]

By Root 1328 0
one block needs to be accessed to read a file that is 3 KB. On a filesystem with a block size of 1024 bytes, three different blocks must be accessed for I/O on that file. Now, consider a file that is many megabytes in length. The increase in the number of blocks that must be accessed to read that file is substantial when a smaller block size is used. In cases in which the blocks holding the data in the file are not contiguous, that also means that additional operations to locate the data blocks must also be performed.

Remember my discussion of sequential disk I/O from Chapter 2. Using bigger block sizes helps increase sequential data access for large files. Larger block sizes reduce file fragmentation by insuring that bigger chunks of files are contiguous. This translates into improved performance because the disk performs fewer seeks when reading or writing large files.

In general, use smaller block sizes when you anticipate creating many small files that could fit into single small blocks. Use a larger block size when you expect to be working with larger files. As a rule, you can safely use the 4 KB default block size on filesystems larger than a few hundred megabytes. Unless you have sound reasons for going with a smaller block size, 4 KB is likely to be a good choice.

Organization

Different filesystems implement different methods for organizing data. Traditional Unix filesystems relied on linked lists to organize inodes and data blocks. A table of inodes pointed to a physical disk block. This arrangement obviously doesn't scale well. So, newer filesystems have sought to optimize the process. ext2, for example, applies a block bitmap and splits up the inode table so that it is distributed across the entire disk. Rather than look up a data block from a single, large table, a filesystem such as ext2 needs to examine only a small subset of inodes to perform I/O.

As the complexity of applications and operating systems has evolved, so has filesystem design. Today, many new filesystems implement a data structure known as a B-tree to organize the filesystem. B-trees have been used in database design for many years. A B-tree is optimized so that it can be quickly accessed, even when it's stored on a hard disk. This usually means that the size of a leaf in a B-tree is equal to, or is some function of, the size of a filesystem data block.

A B-tree is similar to a balanced binary tree, with a few notable exceptions. B-trees have a large branching factor. Where a binary tree has only two leaves per node, a B-tree can have many, which makes the path to access data much shorter. In turn, the height of a B-tree is small, compared with a traditional binary tree. Some filesystems use B-trees exclusively, while others implement a combination with the traditional linked-list/block bitmap approach. A thorough discussion of data structures and algorithms is beyond the scope of this chapter, however. I humbly refer your to more learned texts on the subject, such as Readings in Database Systems, edited by Michael Stonebreaker and Joseph M. Hellerstein (Morgan Kaufmann); The Art of Computer Programming, by Donald E. Knuth (Addison-Wesley); and Algorithms in C, by Robert Sedgewick (Addison-Wesley).

Journaling Filesystems

Journaling offers improved filesystem reliability and fast crash recovery through the use of a transaction log, or journal. The journal is an on-disk log of metadata, or data about the filesystem, that is kept up-to-date as the filesystem changes.

Filesystems without journaling store changes to the updates in memory. These changes are periodically flushed from memory and written to disk. If a crash occurs before the buffers are flushed, data that has not been written to disk is lost. Instead of storing these changes in memory, a journaling filesystem writes a log of the changes to disk. The actual data is kept in memory until enough free system resources are available so that the full write operations can be performed efficiently. When the data is committed to disk, the journals are updated.

The journal allows

Return Main Page Previous Page Next Page

®Online Book Reader