I have tried a number of suggestions in response to my earlier messages
about this problem, without success. I thought it might be useful to
summarise the problem, and the attempts made to solve it so far, in the
hope that this has narrowed down the possibilities.

What I am trying to achieve:

I am trying to create a server with root on a large RAID-5 across 4 40Gb
IDE disks. I will have a small (16Mb) /boot partition on hda, perhaps
mirrored on the second disk, around 180Mb on each disk going to create 2
RAID-1 swap devices, and the rest (around 39Gb on each disk) going to a
root filesystem on RAID-5. This capacity will be used mainly for storing
multimedia files, such as MP3s. I want one large filesystem so that I do
not have to make arbitrary judgements about how much space I will need
for the various types of files.


Where I'm at now:

I am using Mandrake 8.0 (kernel 2.4.3). I tried to install straight to
RAID, but I encountered massive corruption problems that effectively
killed the system, so I have temporarily installed on hda alone. I have
restored around 35Gb of MP3s to this disk, which is therefore nearly
full and working without difficulty. I am now trying to create the
RAID-5 in degraded mode on the other 3 disks using the failed-disk
option for hda. To see whether the problem is with the degraded nature
of the array, I have also tried to create a 3-disk array, but with no
more success. Swap is not yet on RAID. For the moment, I have a 184Mb
swap partition on hda. And /boot is not yet mirrored. It is also a small
partition on hda.


The problem:

When I try to create a filesystem on the degraded array, or when the
3-disk array tries to synchronise, I get the following message:

raid 5: switching cache buffer size, 4096 --> 1024

and the system freezes, requiring a hard reset.


The equipment:

A Duron 800 with 128Mb RAM on a Gigabyte GA-7ZXR mainboard. The disks
are identical Fujitsu MPG3409AT units. The mainboard uses the VIA KT133
chipset, and has an onboard Promise controller, which provides 2
UDMA-100 channels, additional to the standard 2 IDE (66) channels. The
Promise controller also offers IDE-RAID, but I am not trying to use
that, and have disabled it on the board. The board also has sound (AC97)
onboard. The NIC is an Intel EtherExpress Pro 100, and I also have an
Adaptec 2940-UW SCSI controller for my CD-ROM and Travan T20 tape drive.
I have an old Promise PCI card offering 2 UDMA-33 channels, which I have
tried as an alternative to the onboard controller, but it made no
difference. The disks are each in removeable containers, which accept
only 40-pin connectors. The disks are therefore running at UDMA-33 at
best, even though the controllers and disks are capable of 66 or 100. By
default, I have the first two disks (hda and hdc) on the mainboards
standard IDE channels, and the other 2 disks (hde and hdg) on the 2
channels offered by the Promise controllers.


Solutions attempted so far:

Mike Black suggested that it might be a problem with the Promise
controller. I tried putting the 3rd disk as slave on the second IDE
channel, so that only one disk was on the Promise controller. This did
not help. I then put the fourth disk as slave on the first IDE channel,
and disabled the Promise controller. The disks now appeared as hda, hdb,
hdc and hdd. This still did not help.

[EMAIL PROTECTED], suggested in reply to an earlier attempt to deal
with this problem, that it was a disk geometry issue. This is also my
strongest suspicion. I cannot gain access to the Promise BIOS, so I
could not adjust the geometry in BIOS of any disks on that controller
However, with all disks now on the first two channels, I can play with
the geometry in BIOS. With the BIOS set to Auto (which I assume means
that the disks report their own geometry), /proc/ide/hdx/geometry
reported the physical and logical geometry of all the disks except hda
as 79428,16,63. The Large-Disk HOWTO suggests that this is more
cylinders than linux can handle. I have tried changing the geometry in
BIOS to either 19623,16,255 or 4983,255,63. If I do not pass geometry to
linux at the BootPrompt, /proc/ide/hdx/geometry reports the physical
geometry of disks set to 4983,255,63 as 84723,15,63. The physical and
logical geometries are consistent for a setting of 19623,16,255, but
fdisk reports the partitions falling across boundaries with this
setting. I have passed the appropriate geometries through a lilo append
line. None of this helps. All combinations of geometry that I have tried
have resulted in exactly the same problem when I try to create the
filesystem on the RAID-5.

A search of the Web for this error message yielded only 2 articles.

One was a message on the lvm-devel list from Luca Berra regarding
problems combining LVM with RAID, but as I am not using LVM and know
very little about it, I cannot see how to relate this experience to my
own. I didn't really understand the comments, but they seemed to be
indicating that problems could occur if requests are not properly
aligned. I had noticed the following lines from dmesg, if I did not pass
geometry to the kernel:

Partition check:
 hda: hda1 hda2 hda3 < hda5 >
 hdb: [DM6:MBR] [4983/255/63] hdb1 hdb2 hdb3
 hdc: [PTBL] [4983/255/63] hdc1 hdc2
 hdd: [PTBL] [4983/255/63] hdd1 hdd2

The Large-Disk Howto tells me that this means that hdb has OnTrack
DiskManager installed, which I would expect for at least some of the
disks, due to my erroneous attempts to get these disks working with an
older motherboard which, it turned out, did not support disks this
large. I wondered whether this could be causing requests not to be
properly aligned. I followed the HOWTO's instructions to remove DM,
which appears to have been successful, but this did not help. Moreover,
I also experienced this problem when passing geometry, which would force
DM to be ignored, so this does not seem to be the problem. I do not
fully understand from the HOWTO whether PTBL indicates that DM is also
installed on hdc and hdd, but I cannot eliminate it using the method
described in the HOWTO, so I am guessing that this is simply an
indication of how the kernel has calculated the geometry. If not, this
may be a cause of the problem.

The other related article was a message on the linux-kernel list
regarding deadlocks with a PDC20265 chipset. My Promise controller also
uses a PDC202xx chipset, and the kernel version in use (2.4.3) was the
same, but I get this problem even with the Promise disabled, so I cannot
see how this can be related either. Moreover, unlike the author of that
message, I did not experience lockups using disks on the Promise
controller unless I tried to use them in a RAID. I have successfully
created and used filesystems on each of the disks on any of the
controllers and channels. The problem only occurs when I try to combine
them in a RAID-5.

I would appreciate any suggestions as to what might be causing the
problem, or other circumstances that cause this error, and, ideally, how
I can solve it. I bought these disks months ago to create a larger RAID
than with the previous 4 17Gb units, and I am becoming increasingly
disheartened that I will be able to achieve what I want with linux.
Surely, what I am trying to do is not that uncommon. Can I be the only
one who has experienced this problem?

Cheers,

Bruno Prior


Relevant sections of dmesg with BIOS geometry set to 19623,16,255 for
hdb, hdc, and hdd, and no geometry passed to kernel:

Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 39
VP_IDE: chipset revision 16
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
VP_IDE: VIA vt82c686a (rev 22) IDE UDMA66 controller on pci00:07.1
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:DMA
hda: FUJITSU MPG3409AT E, ATA DISK drive
hdb: FUJITSU MPG3409AT E, ATA DISK drive
hdc: FUJITSU MPG3409AT E, ATA DISK drive
hdd: FUJITSU MPG3409AT E, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 80063424 sectors (40992 MB) w/2048KiB Cache, CHS=4983/255/63,
UDMA(33)
hdb: 80063424 sectors (40992 MB) w/2048KiB Cache, CHS=4983/255/63,
UDMA(33)
hdc: 80063424 sectors (40992 MB) w/2048KiB Cache, CHS=79428/16/63,
UDMA(33)
hdd: 80063424 sectors (40992 MB) w/2048KiB Cache, CHS=79428/16/63,
UDMA(33)
Partition check:
 hda: hda1 hda2 hda3 < hda5 >
 hdb: [DM6:MBR] [4983/255/63] hdb1 hdb2 hdb3
 hdc: [PTBL] [4983/255/63] hdc1 hdc2
 hdd: [PTBL] [4983/255/63] hdd1 hdd2
RAMDISK: Compressed image found at block 0
Uncompressing......done.
Freeing initrd memory: 126k freed
Serial driver version 5.05 (2000-12-13) with HUB-6 MANY_PORTS MULTIPORT
SHARE_IRQ SERIAL_PCI ISAPNP enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md.c: sizeof(mdp_super_t) = 4096
autodetecting RAID arrays
autorun ...
... autorun DONE.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 8192 bind 8192)
Linux IP multicast router 0.06 plus PIM-SM
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
VFS: Mounted root (ext2 filesystem).
SCSI subsystem driver Revision: 1.00
PCI: Found IRQ 9 for device 00:09.0
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.1.8
        <Adaptec 2940 Ultra SCSI adapter>
        aic7880: Wide Channel A, SCSI Id=7, 16/255 SCBs

  Vendor: PIONEER   Model: CD-ROM DR-U16S    Rev: 1.01
  Type:   CD-ROM                             ANSI SCSI revision: 02
  Vendor: HP        Model: T20               Rev: 3.01
  Type:   Sequential-Access                  ANSI SCSI revision: 02
VFS: Mounted root (ext2 filesystem) readonly.
change_root: old root has d_count=3
Trying to unmount old root ... okay
Freeing unused kernel memory: 696k freed
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
hdb: DMA disabled
ide0: reset: success

(I think these errors are a cable issue, but they do not prevent hda
working adequately for now)


/proc/ide/hdx/geometry in the same circumstances as above:

[root@server /root]# cat /proc/ide/hda/geometry
physical     84723/15/63
logical      4983/255/63
[root@server /root]# cat /proc/ide/hdb/geometry
physical     19623/16/255
logical      4983/255/63
[root@server /root]# cat /proc/ide/hdc/geometry
physical     19623/16/255
logical      4983/255/63
[root@server /root]# cat /proc/ide/hdd/geometry
physical     19623/16/255
logical      4983/255/63


[root@server /root]# less /etc/raidtab
raiddev /dev/md0
    raid-level                5
    nr-raid-disks             4
    nr-spare-disks            0
    chunk-size                64
    persistent-superblock     1

    device                    /dev/hdb3
    raid-disk                 0
    device                    /dev/hdc2
    raid-disk                 1
    device                    /dev/hdd2
    raid-disk                 2
    device                    /dev/hda5
    failed-disk                3
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]

Reply via email to