I hate to suggest other tangents but re-seat all the connectors and maybe a power supply test. A brown-out of power would cause issues like this.
On Mon, Jun 1, 2026 at 3:15 PM Charles Curley <[email protected]> wrote: > > On Fri, 22 May 2026 09:53:17 -0600 > Charles Curley <[email protected]> wrote: > > > To be thorough, I have run extended SMART tests on the hard drives, > > kicked mdadm into testing the RAID array, and fscked the LVM > > partitions on the RAID array. Only fsck turned up issues, and that > > has not stopped. > > Some additional testing. > > Suspecting a bad hard drive, I ran more extended tests on all four > members of the RAID array. One showed problems: > > "Error 1 [0] occurred at disk power-on lifetime: 6777 hours (282 days + > 9 hours)", > " When the command that caused the error occurred, the device was > active or idle.", > "", > " After command completion occurred, registers were:", > " ER -- ST COUNT LBA_48 LH LM LL DV DC", > " -- -- -- == -- == == == -- -- -- -- --", > " 40 -- 51 00 01 00 00 00 00 00 00 40 00 Error: UNC 1 sectors at LBA > = 0x00000000 = 0", > "", > " Commands leading to the command that caused the error were:", > " CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time > Command/Feature_Name", > " -- == -- == -- == == == -- -- -- -- -- --------------- > --------------------", > " 25 00 00 00 01 00 00 00 00 00 00 40 00 00:08:36.585 READ DMA > EXT", > " ec 00 00 00 00 00 00 00 00 00 00 00 00 00:08:31.545 IDENTIFY > DEVICE", > " b0 00 da 00 00 00 00 00 c2 4f 00 00 00 00:08:31.542 SMART > RETURN STATUS", > " b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 00:08:31.541 SMART > ENABLE/DISABLE ATTRIBUTE AUTOSAVE", > " ec 00 00 00 00 00 00 00 00 00 00 00 00 00:08:31.541 IDENTIFY > DEVICE", > "", > "SMART Extended Self-test Log Version: 1 (1 sectors)", > "Num Test_Description Status Remaining > LifeTime(hours) LBA_of_first_error", > "# 1 Extended offline Completed without error 00% 6756 > -", > "# 2 Extended offline Completed without error 00% 6573 > -", > "# 3 Extended offline Completed without error 00% 102 > -", > "# 4 Short offline Completed without error 00% 96 > -", > "", > > > So I did the obvious: I failed and remove the drive from the array. The > problem still showed up, but not as many fails in the same data set. > > I have since added the drive back to the array, and am testing the > array now. > > mdadm --monitor --test --oneshot /dev/md0 > > I begin to wonder if I have a bad motherboard. > > -- > Does anybody read signatures any more? > > https://charlescurley.com > https://charlescurley.com/blog/ > -- - Andrew "lathama" Latham -

