On 10/3/22 09:23, piorunz wrote:
On 02/10/2022 21:33, David Christensen wrote:
On 10/2/22 06:19, Marcelo Laia wrote:
# cat /etc/debian_version ; uname -a
bookworm/sid
Linux marcelo 5.19.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.19.11-1
(2022-09-24) x86_64 GNU/Linux
Please install Debian Stable.
Why would he?
I have exactly the same SSD (two of them) in my machine, on Debian
Testing, drives in BTRFS Raid1 mode, everything works perfect. But I
have good SATA cables.
OS version has nothing to do with cabling errors in SSD drive SMART log.
He may as well be using DOS, Windows FreeBSD, any Linux - cabling errors
must never happen.
uname -a
Linux ryzen 5.19.0-2-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.19.11-1
(2022-09-24) x86_64 GNU/Linux
$ sudo smartctl /dev/sda --all | grep "Device
Model\|SATA_Interfac\|DMA_CRC_Error"
Device Model: CT1000MX500SSD1
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always
- 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always
- 0
$ sudo smartctl /dev/sdb --all | grep "Device
Model\|SATA_Interfac\|DMA_CRC_Error"
Device Model: CT1000MX500SSD1
183 SATA_Interfac_Downshift 0x0032 100 100 000 Old_age Always
- 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always
- 0
Even if you and the OP ran identical OS instances (e.g. clones), I do
not believe you two have the same make and model computers. Therefore,
different code paths will be executed -- e.g. device drivers. So, the
OP's computer may be hitting a bug that your computer does not.
I am applying a trouble-shooting strategy -- change one variable, apply
a stimulus, and measure the result. If the result is the same as it was
before, then the result is unlikely to be related to the variable and/or
change. But if the result is different, then the result is likely to be
related to the variable and/or change.
Of course, this is all premised upon devising a stimulus that reliably
reproduces the result. When my HDD's/SSD's were having SATA cable
and/or drive rack problems, reading 10 GB from them typically produced
at least one error.
When the OP read 10 GB of the SSD using the d-i rescue shell, he was
applying a stimulus after changing the variable "OS instance". The
result was different. Therefore, the SATA UDMA CRC errors are related
to changing the OS instance.
But, the above experiment has significant flaws (here are few; I expect
there are more):
1. We cannot reproduce the OP's hardware and software.
2. We do not know what Debian installer the OP used (but we could
obtain it if he told us).
2. The stimulus read from the SSD. The UDMA CRC errors may only occur
during writes.
3.The SMART reports indicate 38 UDMA CRC errors for 1296000877 Logical
Sectors Written and 801097450 Logical Sectors Read. So, an average of 1
error per 5.52E+7 sectors. The test read 2.05E+7 sectors. That might
be too few sectors.
4. Similarly, for Number of Read Commands -- 1 error per 4.43E+5
commands vs. 1.02E+4 test commands.
5. The Debian installer rescue shell is single-user (single-process?),
but the UDMA errors were seen during multi-user operation (SMP). If the
SATA UDMA errors are caused by concurrency/ parallel execution, the d-i
rescue shell environment may not be capable of reproducing the error.
If the OP installs Debian Stable on the SSD, runs the 10 GB sequential
read test, uses the system interactively, and the SATA UDMA errors are
not seen for a some period of time (a week?), then I would be reasonably
confident the problem was the SSD Debian Testing instance. But if the
errors persist, then we will have to think up another hypothesis and
experiment.
David