On 8 Oct 2024 11:29 -0400, from d...@randomstring.org (Dan Ritter): >> The disk has been running continuously for seven years now and I am >> running out of space anyway, so I already ordered a replacement. But I >> do not fully understand what is happening. > > The drive is dying, slowly. In this case it's starting with a > bad patch on a platter.
That would be my take too. The LBA sectors reported in a different post in this thread being as close as they appear to be would also corroborate the platter issue theory. >> | SMART Attributes Data Structure revision number: 16 >> | Vendor Specific SMART Attributes with Thresholds: >> | ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED >> WHEN_FAILED RAW_VALUE >> | 1 Raw_Read_Error_Rate 0x002f 199 169 051 Pre-fail Always >> - 81 >> | 3 Spin_Up_Time 0x0027 198 197 021 Pre-fail Always >> - 9100 >> | 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always >> - 83 >> | 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always >> - 0 >> | 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always >> - 0 >> | 9 Power_On_Hours 0x0032 016 016 000 Old_age Always >> - 61794 >> | 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always >> - 0 >> | 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always >> - 0 >> | 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always >> - 82 >> | 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always >> - 54 >> | 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always >> - 2219 >> | 194 Temperature_Celsius 0x0022 119 116 000 Old_age Always >> - 33 >> | 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always >> - 0 >> | 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always >> - 0 >> | 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline >> - 0 >> | 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always >> - 0 >> | 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline >> - 43 > > This looks like a drive which is old and starting to wear out > but is not there yet. The raw read error rate is starting to > creep up but isn't at a threshold. I agree. The almost 62000 hours is well over 7 years of run time, and based on the start/stop count and power cycle count it's been running basically continuously for that time (which is generally good for longevity, as long as it's not subjected to excessive heat). It's entirely possible that the mechanical components are degrading; which in turn might also be interfering with the physical properties of data storage. Yes, servo tracks and such things are supposed to catch and compensate for that; but it might not be quite that bad yet. Sometimes HDDs fail with a bang, and sometimes they fail with a whimper. Also note that some disks actually lie in SMART data. I don't know if yours does, but I would definitely question a value of 0 for failed (current pending and offline uncorrectable) _and_ reallocated sectors for a disk that's reporting I/O errors, for example. _At least_ one of those should be >0 for a truthful storage device in that situation. What I would not do at this point is subject it to more physical stress than unavoidable. Unless you absolutely must, do not physically unplug or remove that disk before the RAID array has resilvered onto the new disk. It's currently providing value being a second source of truth about what's stored; you don't want to remove it and then find during the resilver that the other current disk has a problem. -- Michael Kjörling 🔗 https://michael.kjorling.se “Remember when, on the Internet, nobody cared that you were a dog?”