3 . 4
I R I X X F S
While XFS supports all the traditional abstractions of a file system, it de-
parts dramatically in its implementation of those abstractions. XFS differs
from the straightforward implementation of a file system in its management
of free disk space, i-nodes, file data, and directory contents.
As previously discussed, the most common way to manage free disk blocks
in a file system is to use a bitmap with 1 bit per block. XFS instead uses a
pair of B+trees to manage free disk space. XFS divides a disk up into large-
sized chunks called allocation groups (a term with a similar meaning in BFS).
Each allocation group maintains a pair of B+trees that record information
about free space in the allocation group. One of the B+trees records free space
sorted by starting block number. The other B+tree sorts the free blocks by
their length. This scheme offers the ability for the file system to find free
disk space based on either the proximity to already allocated space or based
on the size needed. Clearly this organization offers significant advantages for
efficiently finding the right block of disk space for a given file. The only po-
tential drawback to such a scheme is that the B+trees both maintain the same
information in different forms. This duplication can cause inconsistencies if,
for whatever reason, the two trees get out of sync. Because XFS is journaled,
however, this is not generally an issue.
XFS also does not preallocate i-nodes as is done in traditional Unix file sys-
tems. In XFS, instead of having a fixed-size table of i-nodes, each allocation
group allocates disk blocks for i-nodes on an as-needed basis. XFS stores the
locations of the i-nodes in a B+tree in each allocation group--a very unusual
organization. The benefits are clear: no wasted disk space for unneeded files
and no limits on the number of files after creating the file system. However,
this organization is not without its drawbacks: when the list of i-nodes is a
table, looking up an i-node is a constant-time index operation, but XFS must
do a B+tree lookup to locate the i-node.
XFS uses extent maps to manage the blocks allocated to a file. An ex-
tent map is a starting block address and a length (expressed as a number of
blocks). Instead of simply maintaining a list of fixed-size blocks with direct,
indirect, double-indirect, and triple-indirect blocks, XFS again uses B+trees.
The B+tree is indexed by the block offset in the file that the extent maps.
That is, the extents that make up a file are stored in a B+tree sorted by which
position of the file they correspond to.
The B+trees allow XFS to use variable-sized extents. The cost is that the
implementation is considerably more difficult than using fixed-size blocks.
The benefit is that a small amount of data in an extent can map very large
regions of a file. XFS can map up to two million blocks with one extent map.
Another departure from a traditional file system is that XFS uses B+trees
to store the contents of a directory. A traditional file system stores the con-
tents of a directory in a linear list. Storing directory entries linearly does not
scale well when there are hundreds or thousands of items. XFS again uses
B+trees to store the entries in a directory. The B+tree sorts the entries based
Practical File System Design:The Be File System
, Dominic Giampaolo