On Thu, 2 Aug 2001, Jay R. Ashworth wrote:
> Granted, a write failure during the journal write won't horque the
> filesystem, but the *data* still didn't make it onto the disk, right?
Journalling can't protect against failures at the drive, which isn't
possible even in theory. For that you need RAID. Journalling is for when
the machine itself dies.
Excuse me a brief digression into how journalling works. Let's say you're
appending to a file. Things that might change include the file's inode
(block count increases, block pointers and indirect block pointers may
change, atime), directory [am]time, block usage bitmap, and a bunch of
data blocks. Each write is a single transaction as far as the app is
concerned: leave out any part and the rest is garbage. But the disk sees
them as a bunch of independent transactions, separated by a bunch of
costly seeks.
Normal filesystems say lets do it the drive's way and break
all our transactions into a bunch of blocks, combine them with a bunch of
other blocks, and then sort them for maximum performance. This is fine
until we get interrupted in the middle of our block list and now we've got
a bunch of transactions partially written. Now everything touched by a
partial transaction is toast.
Journalling says let's gather up all the parts of the transaction, write
them into a big contiguous section of disk or memory, and mark them as a
single unit, very likely with a single fast write with request no seeks.
When we get a spare second, or when the log is full, we'll take the
transactions and actually write them to all the places they should have
been written to in the first place, marking each transaction completed as
we go. We never lose more than the last partially-written transaction (and
we can do no better), and we can always quickly bring the filesystem back
to a consistent state. We can also do fsync()-type operations much more
rapidly for applications that care enough to insist that data has made it
safely to disk before proceeding (and not have to worry that some other
non-fsyncing() app will hose the filesystem for us anyway).
It's a bit more complicated than that (most journalling systems will
coalesce multiple 'adjacent' transactions into one, slightly increasing
latency for some data to reach the disk but greatly decreasing latency for
other data). The big picture is that journalling does indeed greatly
improve your data integrity.
Never mind that fsck on a modern 100G drive takes an eternity.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
--
To unsubscribe from this list, send a message to [EMAIL PROTECTED]
with the command "unsubscribe linux-embedded" in the message body.
For more information, see <http://waste.org/mail/linux-embedded>.