On 31/12/11 08:55, Stan Hoeppner wrote:
On 12/30/2011 4:37 AM, Tony van der Hoff wrote:
I'm getting irrregular reports on a Squeeze software raid1 array with
two 500GB disks like this:

Dec 30 09:53:39 tony-lx kernel: [143496.872684] ata3.00: exception Emask
0x10 SAct 0x3 SErr 0x400000 action 0x6 frozen
Dec 30 09:53:39 tony-lx kernel: [143496.872694] ata3.00: irq_stat
0x08000000, interface fatal error
Dec 30 09:53:39 tony-lx kernel: [143496.872703] ata3: SError: { Handshk }
Dec 30 09:53:39 tony-lx kernel: [143496.872710] ata3.00: failed command:
WRITE FPDMA QUEUED
Dec 30 09:53:39 tony-lx kernel: [143496.872725] ata3.00: cmd
61/60:00:80:d0:56/00:00:13:00:00/40 tag 0 ncq 49152 out
Dec 30 09:53:39 tony-lx kernel: [143496.872729]          res
40/00:08:20:9b:5a/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Dec 30 09:53:39 tony-lx kernel: [143496.872736] ata3.00: status: { DRDY }
Dec 30 09:53:39 tony-lx kernel: [143496.872742] ata3.00: failed command:
WRITE FPDMA QUEUED
Dec 30 09:53:39 tony-lx kernel: [143496.872755] ata3.00: cmd
61/20:08:20:9b:5a/00:00:13:00:00/40 tag 1 ncq 16384 out
Dec 30 09:53:39 tony-lx kernel: [143496.872758]          res
40/00:08:20:9b:5a/00:00:13:00:00/40 Emask 0x10 (ATA bus error)
Dec 30 09:53:39 tony-lx kernel: [143496.872765] ata3.00: status: { DRDY }
Dec 30 09:53:39 tony-lx kernel: [143496.872776] ata3: hard resetting link
Dec 30 09:53:40 tony-lx kernel: [143497.356148] ata3: SATA link up 6.0
Gbps (SStatus 133 SControl 300)
Dec 30 09:53:40 tony-lx kernel: [143497.358983] ata3.00: configured for
UDMA/133
Dec 30 09:53:40 tony-lx kernel: [143497.359005] ata3: EH complete

Can anyone please enlighten me to what it means; Am I about to lose a disk?

Why haven't you looked at the SMART data for the disk on ATA3?  Normally
that will answer your question directly above.

Well, Stan, I did. Unfortunately I didn't understand the reports, which contain a plethora of information, for which I haven't been able to locate an authoritative explanation, but seem to indicate that the drives are OK:


root@tony-lx:~# smartctl -a /dev/sda
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3500413AS
Serial Number:    6VMS3YG7
Firmware Version: JC45
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Dec 31 09:20:49 2011 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  83) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 120 099 006 Pre-fail Always - 243530983 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 30 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 18363743 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1893 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30 183 Runtime_Bad_Block 0x0032 075 075 000 Old_age Always - 25 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 096 000 Old_age Always - 74 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 071 067 045 Old_age Always - 29 (Lifetime Min/Max 27/30) 194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 16 0 0) 195 Hardware_ECC_Recovered 0x001a 033 024 000 Old_age Always - 243530983 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 455 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 158690451654579 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2626825492 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 377807731

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@tony-lx:~# smartctl -a /dev/sdb
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Device Model:     ST3500413AS
Serial Number:    6VMS41GW
Firmware Version: JC45
User Capacity:    500,107,862,016 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Dec 31 09:24:25 2011 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 ( 600) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  81) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 138763088 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 30 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 061 060 030 Pre-fail Always - 1374378 9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 1893 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 30 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 070 065 045 Old_age Always - 30 (Lifetime Min/Max 27/31) 194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 17 0 0) 195 Hardware_ECC_Recovered 0x001a 036 005 000 Old_age Always - 138763088 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 213915141146536 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2186563909 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2014986862

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.




--
Tony van der Hoff        | mailto:t...@vanderhoff.org
Buckinghamshire, England |


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4eff52c1.3060...@vanderhoff.org

Reply via email to