Hello Miles, On Fri, Oct 18, 2013 at 11:43:59AM -0400, Miles Fidelman wrote: > Do a smartctl -A /dev/sd[abcd] - look for non-zero raw read errors > and reallocated sector counts. I've found, at least for the WD > drives I use in my servers - anything other than a 0 raw-read-error > count is a sign of near-term disk failure. The first time I > encountered the symptoms you report, it took me a LONG time to > figure it out. The basic SMART test is useless.
IIUIC, this is output I should be looking: sda: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 230210521 5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 5832 sdb: 1 Raw_Read_Error_Rate 0x000f 119 082 006 Pre-fail Always - 234455192 5 Reallocated_Sector_Ct 0x0033 099 099 036 Pre-fail Always - 2320 sdc: 1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 87852008 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 sdd: 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 187317944 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 Acording to this, all drives are bad, but only sda behaves badly. Three more values are reported as Pre-fail: Spin_Up_Time, Seek_Error_Rate and Spin_Retry_Count. Full output for sda: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 095 094 006 Pre-fail Always - 230231673 3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 3 5 Reallocated_Sector_Ct 0x0033 096 096 036 Pre-fail Always - 5832 7 Seek_Error_Rate 0x000f 087 060 030 Pre-fail Always - 471820319 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 9750 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 4 183 Runtime_Bad_Block 0x0032 098 098 000 Old_age Always - 2 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 386 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1 190 Airflow_Temperature_Cel 0x0022 063 056 045 Old_age Always - 37 (Min/Max 22/44) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1266 194 Temperature_Celsius 0x0022 037 044 000 Old_age Always - 37 (0 22 0 0) 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 174414326932768 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 23166370191361 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 174661697951516 > Note that this particularly applies if you're not using an > enterprise-class drive. Standard drives try very hard to read from > the medium, and take a long time before they give up. Enterprise > drives assume they're part of a RAID array and just give up, > throwing an error. I'm using Seagate ST3000DM001-9YN166 drives, not enterprise-class drives. Regards, Veljko -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20131018162638.gb11...@angelina.example.com