Hi guys! I am running Debian 12 Stable, up to date, on a low-spec Dell Inspiron 15 3000 Model 3511. Firmware is also up to date.
I have a 4 Gb Western Digital external usb SATA HDD, Model WDC WD40NDZW-11A8JS1. It has only one partition, formatted as ext4. The filesystem is labeled MSD00012. Every night, I use rsync to copy all contents of a (theoretically) identical drive, which has filesystem label MSD00014, to the drive with MSD00012. Two nights ago, I could not do the copy correctly. Apparently, as a safety measure, MSD00012 was automatically re-mounted as read only, due to a filesystem error. I used the gnome-disks utility to unmount and then remount it. It was remounted as read-write. Now it "works", BUT . . . I ran: sudo smartctl --test=long /dev/sdb on it, and it reports a Current_Pending_Sector error, at LBA 325904690. >From sudo smartctl --all /dev/sdb > backup_drive_b_test.txt: ------------------------------------------------------------ smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-18-amd64] (local build) Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Elements / My Passport (USB, AF) Device Model: WDC WD40NDZW-11A8JS1 Serial Number: WD-XXXXXXXXXXXX LU WWN Device Id: 5 0014ee 269112168 Firmware Version: 01.01A01 User Capacity: 4,000,753,475,584 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Form Factor: 2.5 inches TRIM Command: Available, deterministic Device is: In smartctl database 7.3/5319 ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Tue Feb 20 11:32:04 2024 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (12240) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 24) minutes. SCT capabilities: (0x30b5) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 198 051 Pre-fail Always - 41 3 Spin_Up_Time 0x0027 253 253 021 Pre-fail Always - 4741 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1175 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1311 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 693 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 23 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3045 194 Temperature_Celsius 0x0022 114 102 000 Old_age Always - 38 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 1311 3259046960 # 2 Short offline Completed without error 00% 1311 - # 3 Short offline Completed without error 00% 1311 - Selective Self-tests/Logging not supported -------------------------------------------------------------------- I ran "check" in gnome-disks. It showed a disk error, so I ran "repair" in gnome-disks. Then I ran "check again". It reported no errors. I also ran the partition check/repair utility in Gparted. It did not report any errors. Finally, I did sudo smartctl --test=long /dev/sdb on it, as previously mentioned. According to sudo smartctl --test=long /dev/sdb, there is one Current_Pending_Sector error. I have done some research online, which seems to say that the error will remain until there is an an attempt to write to the bad sector (block). Then it will be "re-mapped", and presumably taken out of service. But since the sector already can not be read, How can it be re-written to a "good" sector? If I knew which file (if any) is using the bad sector, I could try just deleting that file from the "bad" drive, then copy the same file over from the "Good" drive, at which time the bad sector "should" be retired, and replaced by a good sector. Or, as a more "brute force" solution, I could either simply format the bad drive, or do: sudo dd if=/dev/zero of=/dev/sdb bs=4M status=progress conv=fdatasync and then format the drive, or even do: sudo dd if=/dev/urandom of=/dev/sdb bs=4M status=progress conv=fdatasync and then format the drive. But since this is a 4Tb external usb drive, overwriting and formatting the whole drive might take days! And maybe even work the very low-spec computer until it cooks! Any suggestions?