T H E D I S K B L O C K C A C H E
The BeOS cache supports obtaining pointers to cached blocks of data, and
BFS takes advantage of this to reference i-node data directly. This fact, cou-
pled with the requirements of journaling, presents an interesting problem.
If a modification is made to an i-node, the i-node data is written to the log
(which locks the corresponding disk block in the cache). When the transac-
tion is complete, the journaling code unlocks the block and requests a call-
back when the block is flushed to disk. However, the rest of BFS already has
a pointer to the block (since it is an i-node), and so the block is not actually
free to be flushed to disk until the rest of the file system relinquishes access
to the block. This is not the problem though.
The problem is that the journal expects the current version of the block to
be written to disk, but because other parts of the system still have pointers
to this block of data, it could potentially be modified before it is flushed to
disk. To ensure the integrity of journaling, when the cache sets a callback for
a block, the cache clones the block in its current state. The cloned half of
the block is what the cache will flush when the opportunity presents itself. If
the block already has a clone, the clone is written to disk before the current
block is cloned. Cloning of cached blocks is necessary because the rest of the
system has pointers directly to the cached data. If i-node data was modified
after the journal was through with it but before it was written to disk, the file
system could be left in an inconsistent state.
When Not to Use the Cache
Despite all the benefits of the cache, there are times when it makes sense not
to use it. For example, if a user copies a very large file, the cache becomes
filled with two copies of the same data; if the file is large enough, the cache
won't be able to hold all of the data either. Another example is when a pro-
gram is streaming a large amount of data (such as video or audio data) to disk.
In this case the data is not likely to be read again after it is written, and since
the amount of data being written is larger than the size of the cache, it will
have to be flushed anyway. In these situations the cache simply winds up
causing an extra
from a user buffer into the cache, and the cache
has zero effectiveness. This is not optimal. In cases such as this it is better
to bypass the cache altogether and do the I/O directly.
The BeOS disk cache supports bypassing the cache in an implicit manner.
Any I/O that is 64K in size or larger bypasses the cache. This allows programs
to easily skip the cache and perform their I/O directly to the underlying de-
vice. In practice this works out quite well. Programs manipulating large
amounts of data can easily bypass the cache by specifying a large I/O buffer
size. Those programs that do not care will likely use the default
size of 4K and therefore operate in a fully buffered manner.
There are two caveats to this. The cache cannot simply pass large I/O
transactions straight through without first checking that the disk blocks be-
Practical File System Design:The Be File System
, Dominic Giampaolo