Folks,

I had two drives fails on a 13 drive RAID5 array with bad-blocks
(confirmed this with external disk scan). I replaced and hot-added new
drives back into array. Resync completed without incident. I moved the
machine back into production and after reboot, the two new drives get
kicked out of array for being non-fresh. Everything I try results in
these two drives always getting kicked out.

Here's what I tried.
        searched and read for at least 10 hours for info on kicking
"non-fresh" 
        hot-adding then rebooting 5 times with same result
                using kernel 2.4.30, 2.6.11.8 and 2.6.16.8. 
                (resync takes 4 hours to complete, so iterations take a while)
        mdadm version is v1.12 
        after the resync before the reboot, manual stopping and starting the
array 
                always in correct operation (no kicking of drives)

My questions are
        1. How does a drive become non-fresh? 
        2. Is the non-fresh status related to 'events'?
        3. How can I determine that all the drives are fresh before a reboot?
        4. 2.4.30 and 2.6.11.8 dmesg output mentions kicking non-fresh drives.
                2.6.16.8 doesn't even consider my new drives, see "after 
reboot" below
            After a resync, how can I determine that all my drives are actually
part of the array?
            mdadm -E /dev/sdX1  for each drive shows the same info. 
        5. From everything I've tried, the array looks fine before the reboot.
But no matter
                what I've tried, the drives are kicked upon reboot.   
        6. /proc/mdstat reports "Personalities : [raid5] [raid4]", the array is
raid5, 
                where raid4 come from?

Thanks for reading this and any suggestions you can offer.
Craig

-- 
------------------------------------------------------------
Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509
Author of Embedded Linux: Hardware, Software and Interfacing
www.embeddedlinuxinterfacing.com




The two drives in question are sdj1 and sdk1.

Here's output after the resync before the reboot

[EMAIL PROTECTED]: cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdj1[12](S) sdk1[9] sda1[0] sdl1[11] hdc1[10] sdd1[8]
sdh1[7] sdg1[6] sdf1[5] sde1[4] sdi1[3] sdc1[2] sdb1[1]
      1289056384 blocks level 5, 128k chunk, algorithm 2 [12/12]
[UUUUUUUUUUUU]

unused devices: <none>

[EMAIL PROTECTED]: mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Thu Jan 16 09:10:52 2003
     Raid Level : raid5
     Array Size : 1289056384 (1229.34 GiB 1319.99 GB)
    Device Size : 117186944 (111.76 GiB 120.00 GB)
   Raid Devices : 12
  Total Devices : 13
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu May 25 05:36:58 2006
          State : clean
 Active Devices : 12
Working Devices : 13
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : 4d862825:91140f1a:eb97e7f2:9bfa2403
         Events : 0.2681049

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8      129        3      active sync   /dev/sdi1
       4       8       65        4      active sync   /dev/sde1
       5       8       81        5      active sync   /dev/sdf1
       6       8       97        6      active sync   /dev/sdg1
       7       8      113        7      active sync   /dev/sdh1
       8       8       49        8      active sync   /dev/sdd1
       9       8      161        9      active sync   /dev/sdk1
      10      22        1       10      active sync   /dev/hdc1
      11       8      177       11      active sync   /dev/sdl1

      12       8      145        -      spare   /dev/sdj1

[EMAIL PROTECTED]: mdadm -E /dev/sdj1
/dev/sdj1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 4d862825:91140f1a:eb97e7f2:9bfa2403
  Creation Time : Thu Jan 16 09:10:52 2003
     Raid Level : raid5
   Raid Devices : 12
  Total Devices : 13
Preferred Minor : 0

    Update Time : Thu May 25 05:36:58 2006
          State : clean
 Active Devices : 12
Working Devices : 13
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 9943fc98 - correct
         Events : 0.2681049

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this    12       8      145       12      spare   /dev/sdj1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8      129        3      active sync   /dev/sdi1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
   8     8       8       49        8      active sync   /dev/sdd1
   9     9       8      161        9      active sync   /dev/sdk1
  10    10      22        1       10      active sync   /dev/hdc1
  11    11       8      177       11      active sync   /dev/sdl1
  12    12       8      145       12      spare   /dev/sdj1


------------------------------------------------------------------------------------------------
Now after reboot

[EMAIL PROTECTED]: uname -a
Linux vaughan 2.6.16.8 #1 Wed May 24 15:00:27 MDT 2006 i686 GNU/Linux

>From dmesg
md: Autodetecting RAID arrays.
md: autorun ...
md: considering sdl1 ...
md:  adding sdl1 ...
md:  adding sdi1 ...
md:  adding sdh1 ...
md:  adding sdg1 ...
md:  adding sdf1 ...
md:  adding sde1 ...
md:  adding sdd1 ...
md:  adding sdc1 ...
md:  adding sdb1 ...
md:  adding sda1 ...
md:  adding hdc1 ...
md: created md0

The kernel didn't add sdj or sdk.


[EMAIL PROTECTED]: cat /proc/mdstat
Personalities : [raid5] [raid4]
md0 : active raid5 sdl1[11] sdi1[3] sdh1[7] sdg1[6] sdf1[5] sde1[4]
sdd1[8] sdc1[2] sdb1[1] sda1[0] hdc1[10]
      1289056384 blocks level 5, 128k chunk, algorithm 2 [12/11]
[UUUUUUUUU_UU]

unused devices: <none>

[EMAIL PROTECTED]: mdadm -D /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Thu Jan 16 09:10:52 2003
     Raid Level : raid5
     Array Size : 1289056384 (1229.34 GiB 1319.99 GB)
    Device Size : 117186944 (111.76 GiB 120.00 GB)
   Raid Devices : 12
  Total Devices : 11
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu May 25 05:36:58 2006
          State : clean, degraded
 Active Devices : 11
Working Devices : 11
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           UUID : 4d862825:91140f1a:eb97e7f2:9bfa2403
         Events : 0.2681049

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8      129        3      active sync   /dev/sdi1
       4       8       65        4      active sync   /dev/sde1
       5       8       81        5      active sync   /dev/sdf1
       6       8       97        6      active sync   /dev/sdg1
       7       8      113        7      active sync   /dev/sdh1
       8       8       49        8      active sync   /dev/sdd1
       9       0        0        -      removed
      10      22        1       10      active sync   /dev/hdc1
      11       8      177       11      active sync   /dev/sdl1


 [EMAIL PROTECTED]: mdadm -E /dev/sdj1
/dev/sdj1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 4d862825:91140f1a:eb97e7f2:9bfa2403
  Creation Time : Thu Jan 16 09:10:52 2003
     Raid Level : raid5
   Raid Devices : 12
  Total Devices : 13
Preferred Minor : 0

    Update Time : Thu May 25 05:36:58 2006
          State : clean
 Active Devices : 12
Working Devices : 13
 Failed Devices : 0
  Spare Devices : 1
       Checksum : 9943fc98 - correct
         Events : 0.2681049

         Layout : left-symmetric
     Chunk Size : 128K

      Number   Major   Minor   RaidDevice State
this    12       8      145       12      spare   /dev/sdj1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       8       17        1      active sync   /dev/sdb1
   2     2       8       33        2      active sync   /dev/sdc1
   3     3       8      129        3      active sync   /dev/sdi1
   4     4       8       65        4      active sync   /dev/sde1
   5     5       8       81        5      active sync   /dev/sdf1
   6     6       8       97        6      active sync   /dev/sdg1
   7     7       8      113        7      active sync   /dev/sdh1
   8     8       8       49        8      active sync   /dev/sdd1
   9     9       8      161        9      active sync   /dev/sdk1
  10    10      22        1       10      active sync   /dev/hdc1
  11    11       8      177       11      active sync   /dev/sdl1
  12    12       8      145       12      spare   /dev/sdj1





-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to