Bug#555456: Processed: reassign 555456 to e2fsprogs

micah anderson Sun, 15 Nov 2009 10:41:36 -0800

Hi Ted,

Thanks for the reply, see my responses in-line below:


Excerpts from Theodore Tso's message of Sat Nov 14 16:21:51 -0500 2009:
> Do you have the full e2fsck transcript (it looks like what you
> submitted to BTS was only a partial transcript)?

Unfortunately, no. Although we were running this fsck within a screen'd
serial console, the output vastly exceeded the buffer of both the xterm
and the screen process. The fsck itself took a very long time, its a
terabyte drive, so I was following it for some time, and paid very close
attention to what I saw. Everything was typical up to the point where it
started asking me the questions I included in the bug report. Those
questions were the bulk of the log, the same question repeated thousands
of times... then two times, the PROGRAMMING BUG appeared in the middle
of those questions, I was able to manage to capture that part of the
log. 

> Also, can you tell me something about the files which got the
> PROGRAMMING BUG error?  It would be useful to see the pathname and
> inode breakdown of the inode(s) in question.  For example, for inode
> 223806323, the following debugfs commands will give the pathname and
> inode:
> 
> % debugfs /dev/mapper/vg_hoopoe0-backups
> debugfs: ncheck <223806323>
> debugfs: stat <223806323>
> debugfs: quit

Sure, I would be happy to do this. However, this will have to wait until
the non-destructive read/write tests we have been doing on the drive
finish.

As of this writing we are here:

 75.30% done, 138:02:35 elapsed            

Once it finishes, I'll provide this additional information.

> The other thing might be worth trying is re-running e2fsck and see
> what you see, via "e2fsck -f /dev/mapper/vg_hoopoe0-backups".  The
> PROGRAMMING BUG error can also result by having a hard drive returning
> different data when a particular inode tabke block is read at
> different times.  So if there is something flakey in your storage
> device --- for example, if you have a RAID 1 setup, and the two
> mirrors aren't synchronized, it could be that e2fsck would read from
> disk #1 during pass 1, and then later when pass #4, if the disk read
> comes from disk #2 returns different data, you will also get the
> PROGRAMMING BUG error.

Also will do as you suggest when I can. The system is *not* setup with a
RAID 1 configuration, but it does have file-system encryption setup via
dmcrypt and then the LVM layer on top of it.

> It should also be the case after a single run of e2fsck, if all
> answers are answered with 'yes', that a subsequent run of e2fsck
> should find no problems.  This, of course, is assuming that there are
> no e2fsck bugs and that storage device is reliable.  (That is, data
> written to a block will be read back when the block is read, and data
> read from a block at time T and data read time T+n will be the same,
> if there are no intervening writes to that block.)

Sounds reasonable. The odd thing about this particular system is that
this is the second time that we have needed to do a fsck of this type on
this system after a routine debian kernel security upgrade. We aren't
exactly sure what is going on here, if its the disk that has an issue,
the controller, the memory, or what. We have a typical burn-in process
to weed out bad memory before we deploy boxes (memtest86+ plus some
cpuburns or kernel compiles), but things can always change. This is why
we are doing the non-destructive read/write tests on the drive right
now, after that completes I can obtain the information you have
requested, and perhaps we will attempt some stress tests.

thanks,
micah



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Bug#555456: Processed: reassign 555456 to e2fsprogs

Reply via email to