Hi,

we believe we were able to resolve the zero hole bugs we were suffering
from.

The symptoms were incomplete file writes. Parts of the files turned out to be
replaced by binary zero chunks of various length (usually 1024-4096 bytes long).

What made the situation difficult was the fact that those zero "holes" were
created by two independant bugs!

A fixed Debian binary package will follow tomorrow, here is a short
explanation of the bugs and the fixes.

1. Page Out Race

When a page of a file was partially filled, then paged out, and then further
filled, and paged out again, the second page-out could be completed before
the former, leading to the incomplete first page written over the complete
second version of the page:

Pages as they are filled:

    AAAA BBBB CCCC D000       -> pageout
               ... DDDD EEEE  -> pageout

On disk:

       AAAA ____ ____ DDDD EEEE (second pageout completes while first is running)
later: AAAA BBBB CCCC D000 EEEE (3*1024 bytes lost in D-block)


The solution is to serialize both writes, so that they not only start one after
another (this was already guaranteed), but the second would not start before
the first has finished. This has to be done for all *overlapping* page outs.
The necessary changes were done by Thomas today, in libpager/* (see
ChangeLog entry). A PM_WRITEWAIT attribute is set and checked before
starting the overlapping pageout. The actual bug was in
libpager/data-return.c.

This bug only created holes smaller than 4096 bytes (the end of aligned at
the end of a page).

2. Delay in writing free'd indirect blocks

A file consists of the file data blocks, and their location is stored in
indirect blocks[2]. Those indirect blocks are not managed by the file pager,
but by the global disk pager. When those blocks are free'd, they are returned
to the free disk block pool, and can be used for files or indirect blocks of
other files later. However, the data of the indirect block for the deleted file
(an empty page), was not flushed from the disk_pager's cache. So it could
happen that it was flushed at a much later timer, when the block was already
in use for some other purpose.

Block X is in use as an indirect block for some file.
The file is deleted, and block X is released, but still in the cache.
Block X is reused for the data of a file.
The disk cache is flushed, and the empty (former indirect) block X is
written to disk, overwriting the files content.

The fix is to forcefully flush the disk pager contents of block X when
deleting the file. This is done by calling pager_flush_some on the
disk_pager in ext2fs/truncate.c: trunc_indirect()

This bug created holes of exactly 4096 bytes long (page aligned).

>From this description, it should follow that Bug 1 can be reproduced on an
ufs filesystem as well, while Bug 2 is specific to ext2fs.

I hope my description is accurate as far as it goes[1]. The details can be
found out by staring at the code long enough and banging your head with the
Mach Kernel Principles. :)

Thanks,
Marcus

[1] If you find an error, let us know.
[2] The first eleven data blocks are in the inode, for more data blocks,
    one indirect block is used (for bigger files, even more are necessary).

-- 
`Rhubarb is no Egyptian god.' Debian http://www.debian.org Check Key server 
Marcus Brinkmann              GNU    http://www.gnu.org    for public PGP Key 
[EMAIL PROTECTED],     [EMAIL PROTECTED]    PGP Key ID 36E7CD09
http://homepage.ruhr-uni-bochum.de/Marcus.Brinkmann/       [EMAIL PROTECTED]

Reply via email to