Enrico Zini wrote: > > Hello! > > I'm posting this to three lists because I can't track down the problem to a > single cause. > > We are setting up a Dell Poweredge 2450 to run the central MySQL database > of our crowded web site. If we raise the load on the database by moving > some services from the current server to the new one, we start > experiencing random database indexes corruption. Sadly, we've been unable > so far to track down the issue to a single query or some reproducible > sequence of events. I'll try to include here all the details we have. > > The MySQL errors are all 127 "Record-file is crashed" or 134 "Record was > already deleted (or record file crashed)". Here are some examples: > -------------------------------------------------------------------- > Got error 134 from table handler executing query "UPDATE Delayed2 SET [...] > WHERE ID=1 AND Tipo=1" (err: 1030) > Got error 127 from table handler executing query "SELECT Simbolo, Prezzo, > UNIX_TIMESTAMP(Ora), TotVol, NT FROM Realtime WHERE ID=23 AND Tipo = 5" > (err: 1030) > -------------------------------------------------------------------- > > This happens both with the Debian MySQL version 3.23.36-6 (from testing) > and with the 3.23.43 precompiled binaries downloaded from the > www.mysql.com web site. We can fix the tables with myisamchk, but after > some time (ranging from a couple of hours to a couple of days) the > database get corrupted again. > > The system is a Debian testing with kernel 2.4.12 (an upgrade to 2.4.13 is > planned). File system is ext2. > > Since I don't know if it's a MySQL fault or a hardware fault, I also > include hardware and driver details: > > dmesg log of AIC7XXX and megaraid initialization: > -------------------------------------------------------------------- > SCSI subsystem driver Revision: 1.00 > scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1 > <Adaptec aic7899 Ultra160 SCSI adapter> > aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/255 SCBs > > scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.1 > <Adaptec aic7899 Ultra160 SCSI adapter> > aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/255 SCBs > > megaraid: v1.17a (Release Date: Fri Jul 13 18:44:01 EDT 2001) > megaraid: found 0x8086:0x1960:idx 0:bus 0:slot 2:func 1 > scsi2 : Found a MegaRAID controller at 0xe0808000, IRQ: 20 > megaraid: [1.01:1p00] detected 1 logical drives > megaraid: channel[1] is raid. > megaraid: channel[2] is raid. > scsi2 : AMI MegaRAID 1.01 254 commands 16 targs 2 chans 8 luns > scsi2: scanning channel 1 for devices. > Vendor: DELL Model: 1x4 U2W SCSI BP Rev: 1.16 > Type: Processor ANSI SCSI revision: 02 > scsi2: scanning channel 2 for devices. > scsi2: scanning virtual channel for logical drives. > Vendor: MegaRAID Model: LD0 RAID5 17136R Rev: 1.01 > Type: Direct-Access ANSI SCSI revision: 02 > Attached scsi disk sda at scsi2, channel 2, id 0, lun 0 > SCSI device sda: 35094528 512-byte hdwr sectors (17968 MB) > Partition check: > /dev/scsi/host2/bus2/target0/lun0: p1 p2 p3 < p5 p6 > > -------------------------------------------------------------------- > > Some /proc stats: > -------------------------------------------------------------------- > service:~# cat /proc/megaraid/0/config > Controller Type: 438/466/467/471/493 > Base = e0808000, Irq = 20, Logical Drives = 1, Channels = 2 > Version =1.01:1p00, DRAM = 128Mb > Controller Queue Depth = 254, Driver Queue Depth = 126 > service:~# cat /proc/megaraid/0/stat > Statistical Information for this controller > Interrupts Collected = 2065925 > Logical Drive 0: > Reads Issued = 136738, Writes Issued = 1929155 > Sectors Read = 2611796, Sectors Written = 25004064 > > service:~# cat /proc/megaraid/0/mailbox > Contents of Mail Box Structure > Fw Command = 0x02 > Cmd Sequence = 0x66 > No of Sectors= 0008 > LBA = 0x16c5c8a > DTA = 0x0230f000 > Logical Drive= 0x00 > No of SG Elmt= 0x00 > Busy = 0 > Status = 0x00 > service:~# cat /proc/megaraid/0/status > TBD > -------------------------------------------------------------------- > > Sadly, I don't know if I should upgrade some firmware, nor I know what > releases > our firmware are, since the machine is hosted in a farm ~100Km from here and > it's hard for me to track boot messages and run boot floppies. Is there a way > to know that from Linux? > > Do you have any hints for me to try to solve this problem? > > Bye, Enrico > > -- > GPG key: 1024D/797EBFAB 2000-12-05 Enrico Zini (Unibo) <[EMAIL PROTECTED]> >
Hi, We have experienced the same kind of problem on our Web database server (Dell PowerEdge 2450 bi-processor PIII/733) in January. It was running Linux 2.2.14 (RedHat 6.2) and MySQL 3.23.30. At that time, rebooting in single processor mode "solved" the problem. We have installed a new database server (Dell 2550 bi-pro PIII/1000) with Linux 2.4.3 (RedHat 7.1 with kernel update) and MySQL 2.23.42 ten days ago and it is running without problem since then. Your report is scaring me since I was convinced that our index corruption problem was due to some weird behavior of Linux 2.2.14 kernel in SMP mode. Regards -- Joseph Bueno NetClub/Trader.com