78
5
AT T R I B U T E S , I N D E X I N G , A N D Q U E R I E S
cat
man
train
acme
buck
deluxe
indigo
navel
style
able
ball
deft
edge
mean
rowdy
Root node
Figure 5-4 An example B-tree.
Knowing that B-tree nodes are sorted and the links for each entry point
to nodes with keys less than the current key, we can perform a search of
the B-tree. Normally searching each node uses a binary search, but we will
illustrate using a sequential search to simplify the discussion. If we wanted to
find the word deft we would start at the root node and search through its keys
for the word deft. The first key, cat, is less than deft, so we continue. The
word deft is less than man, so we know it is not in this node. The word man
has a link though, so we follow the link to the next node. At the second-level
node (deluxe indigo) we compare deft against deluxe. Again, deft is less than
deluxe
, so we follow the associated link. The final node we reach contains
the word deft, and our search is successful. Had we searched for the word
depend
, we would have followed the link from deluxe and discovered that
our key was greater than deft, and thus we would have stopped the search
because we reached a leaf node and our key was greater than all the keys in
the node.
The important part to observe about the search algorithm is how few nodes
we needed to look at to do the search (3 out of 10 nodes). When there are
many thousands of nodes, the savings is enormous. When a B-tree is well
balanced, as in the above example, the time it takes to search a tree of N keys
is proportional to log
k
(N). The base of the logarithm, k, is the number of keys
per node. This is a very good search time when there are many keys and is
the primary reason that B-trees are popular as an indexing technique.
The key to the performance of B-trees is that they maintain a reasonable
balance. An important property of B-trees is that no one branch of the tree
is significantly taller than any other branch. Maintaining this property is
a requirement of the insertion and deletion operations, which makes their
implementation much more complex than the search operation.
Insertion into a B-tree first locates the desired insertion position (by doing
a search operation), and then it attempts to insert the key. If inserting the key
would cause the node to become overfull (each node has a fixed maximum
size), then the node is split into two nodes, each getting half of the keys.
Splitting a node requires modifications to the parent nodes of the node that
Practical File System Design:The Be File System
, Dominic Giampaolo
page 78