Filesystem corruption on md (Software) RAID

Sebastian Flothow Mon, 06 Aug 2007 08:59:54 -0700

Hi,

I'm getting massive filesystem corruption on an md RAID comprising 4
SATA disks. I tried ext3, xfs and reiserfs on RAID level 5 as well as
ext3 on RAID level 1 (using only 2 disks); all can be crashed reliably
by running bonnie++ for just a few minutes. In the case of ext3, I
usually get dmesg output like this:


[...]
md0: rw=1, want=1482184800, limit=490223232
attempt to access beyond end of device
md0: rw=1, want=1482184800, limit=490223232
attempt to access beyond end of device
md0: rw=1, want=1482184800, limit=490223232
Buffer I/O error on device md0, logical block 185273099
lost page write due to I/O error on md0
Aborting journal on device md0.
EXT3-fs error (device md0) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device md0) in ext3_dirty_inode: Journal has aborted
EXT3-fs error (device md0) in ext3_new_blocks: Journal has aborted
ext3_abort called.
EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only

The filesystems are impossible to repair afterwards, e2fsck in
particular will run for ages, and eventually segfault.

By contrast, ext3 directly on the physical disk partition works fine and
withstood days of continouus bonnieing.

This is with Etch, kernel 2.6.18-4-686-bigmem. FWIW, the machine used to
run Sarge with a 2.4 kernel, where the RAID worked fine.

Now, it seems quite unlikely that RAID is completely broken in 2.6, so I
suppose it might be related to the hardware: it's a Pentium 4 @ 2.8 GHz,
1.5 GiB RAM, the SATA Controller is a Promise S150 SX4 using the
sata_sx4 kernel module.

Any ideas on this?


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Filesystem corruption on md (Software) RAID

Reply via email to