I hate to suggest other tangents but re-seat all the connectors and
maybe a power supply test. A brown-out of power would cause issues
like this.

On Mon, Jun 1, 2026 at 3:15 PM Charles Curley
<[email protected]> wrote:
>
> On Fri, 22 May 2026 09:53:17 -0600
> Charles Curley <[email protected]> wrote:
>
> > To be thorough, I have run extended SMART tests on the hard drives,
> > kicked mdadm into testing the RAID array, and fscked the LVM
> > partitions on the RAID array. Only fsck turned up issues, and that
> > has not stopped.
>
> Some additional testing.
>
> Suspecting a bad hard drive, I ran more extended tests on all four
> members of the RAID array. One showed problems:
>
>       "Error 1 [0] occurred at disk power-on lifetime: 6777 hours (282 days + 
> 9 hours)",
>       "  When the command that caused the error occurred, the device was 
> active or idle.",
>       "",
>       "  After command completion occurred, registers were:",
>       "  ER -- ST COUNT  LBA_48  LH LM LL DV DC",
>       "  -- -- -- == -- == == == -- -- -- -- --",
>       "  40 -- 51 00 01 00 00 00 00 00 00 40 00  Error: UNC 1 sectors at LBA 
> = 0x00000000 = 0",
>       "",
>       "  Commands leading to the command that caused the error were:",
>       "  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  
> Command/Feature_Name",
>       "  -- == -- == -- == == == -- -- -- -- --  ---------------  
> --------------------",
>       "  25 00 00 00 01 00 00 00 00 00 00 40 00     00:08:36.585  READ DMA 
> EXT",
>       "  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:08:31.545  IDENTIFY 
> DEVICE",
>       "  b0 00 da 00 00 00 00 00 c2 4f 00 00 00     00:08:31.542  SMART 
> RETURN STATUS",
>       "  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00     00:08:31.541  SMART 
> ENABLE/DISABLE ATTRIBUTE AUTOSAVE",
>       "  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:08:31.541  IDENTIFY 
> DEVICE",
>       "",
>       "SMART Extended Self-test Log Version: 1 (1 sectors)",
>       "Num  Test_Description    Status                  Remaining  
> LifeTime(hours)  LBA_of_first_error",
>       "# 1  Extended offline    Completed without error       00%      6756   
>       -",
>       "# 2  Extended offline    Completed without error       00%      6573   
>       -",
>       "# 3  Extended offline    Completed without error       00%       102   
>       -",
>       "# 4  Short offline       Completed without error       00%        96   
>       -",
>       "",
>
>
> So I did the obvious: I failed and remove the drive from the array. The
> problem still showed up, but not as many fails in the same data set.
>
> I have since added the drive back to the array, and am testing the
> array now.
>
> mdadm --monitor --test --oneshot /dev/md0
>
> I begin to wonder if I have a bad motherboard.
>
> --
> Does anybody read signatures any more?
>
> https://charlescurley.com
> https://charlescurley.com/blog/
>


-- 
- Andrew "lathama" Latham -

Reply via email to