could be the kingston eMMC just dying, not necessarily wearing

On Tue, Sep 24, 2019 at 5:04 AM Jeremy Trimble <
[email protected]> wrote:

> Hi All,
>
> I've seen the following strange behavior on a number of BeagleBone Black
> devices:  The eMMC reads back different data each time I read it (and it is
> not being written to by anything else).
>
> The devices had been operating just fine 24/7 for months.  Then, suddenly,
> after a power cycle, they would not boot any more.
>
> 1. The blue "power" LED indicated that the BBB was getting power.
> 2. No other LEDs illuminated.
> 3. No output whatsoever on serial console after power was applied.
>
> It was clear that U-Boot was not able to run.  I was starting to think the
> BBB was dead.
>
> On a whim, I inserted a "flasher" SD card and held down the boot switch
> (S2) while applying power.  The board booted just fine off of the SD card!
>
> I was suspicious of the contents of the eMMC, so I broke into the U-Boot
> prompt, set cmdline=init=/bin/bash and booted into Linux.
>
> Since I'd set the "init=" command-line argument, my bash process was the
> only userspace process running, and I verified that /dev/mmcblk1 (the eMMC
> device) was not mounted.
>
> I then did a raw read of the entire eMMC 4 times and compared the results
> byte-by-byte with a python script.  Each of the 4 reads produced a
> different result, with differences between each of the 4 files distributed
> mostly uniformly across the eMMC!  There were no error codes, kernel debug
> messages, or complaints of any kind that I was able to observe during the
> reading of the eMMC.
>
> I have since seen the same behavior on at least 3 other units.  Rather
> than read back the entire 4GB eMMC device and compare, I've been able to
> test for the same result by doing something like the following:
>
> root@(none):/# dd if=/dev/mmcblk1p3 bs=8M count=3 | md5sum
> 49438591914268785da79c8569b3b571  -
> 3+0 records in
> 3+0 records out
> 25165824 bytes (25 MB) copied, 11.4388 s, 2.2 MB/s
>
> root@(none):/# dd if=/dev/mmcblk1p3 bs=8M count=3 | md5sum
> 21ec205a55605c44c5097d2c07b73029  -
> 3+0 records in
> 3+0 records out
> 25165824 bytes (25 MB) copied, 11.3298 s, 2.2 MB/s
>
> root@(none):/# dd if=/dev/mmcblk1p3 bs=8M count=3 | md5sum
> 4f9cbb7698b9aecc88ff8da69aa21178  -
> 3+0 records in
> 3+0 records out
> 25165824 bytes (25 MB) copied, 11.3444 s, 2.2 MB/s
>
> Note how the md5sum changes each time!  I know that I had previously
> written valid data to /mmcblk1p3 (and even if I hadn't, it shouldn't read
> back as different data each time).
> Again, I am pretty sure that nothing else is writing to the eMMC in the
> cases above (I have booted from the mmcblk0 SD card, my bash shell is the
> only userspace process, and the mmcblk1 eMMC device is not mounted).
>
> If I write zeros to the same location, then the readbacks become reliable
> (and also take much less time):
>
> root@(none):/# dd if=/dev/zero bs=8M count=3 of=/dev/mmcblk1p3
> 3+0 records in
> 3+0 records out
> 25165824 bytes (25 MB) copied, 15.0492 s, 1.7 MB/s
>
> root@(none):/# dd if=/dev/mmcblk1p3 bs=8M count=3 iflag=direct | md5sum
> 3+0 records in
> 3+0 records out
> 25165824 bytes (25 MB) copied, 0.983744 s, 25.6 MB/s
> 77377273b0a4b61febdbf7bbf52b9db9  -
>
> root@(none):/# dd if=/dev/mmcblk1p3 bs=8M count=3 iflag=direct | md5sum
> 3+0 records in
> 3+0 records out
> 25165824 bytes (25 MB) copied, 0.986371 s, 25.5 MB/s
> 77377273b0a4b61febdbf7bbf52b9db9  -
>
> I also power-cycled the board after this and verified that I still get the
> same result (77377273b0a4b61febdbf7bbf52b9db9, reads back in about 1 second
> instead of 11 seconds) after the power-cycle.
>
>
> The first device I discovered this issue on had been exposed to high
> temperatures and so I initially suspected that to be at issue.  However,
> I've since seen this happen to three other BBBs which never left my
> air-conditioned lab.
>
> I've seen the same behavior on at least 4 BBBs now.  So far all of the
> devices I've seen this issue appear to have been populated with the
> Kingston KE4CN2H5A eMMC based on the eMMC size (3825205248 bytes).
>
> Does anyone have any ideas what might be causing this?  Could it just be
> the eMMC device wearing out?  It seems that this eMMC device ( eMMC 4.5
> spec ) does not provide any access to health monitoring statistics.
>
> Any help is greatly appreciated.
>
> -Jeremy Trimble
>
> --
> For more options, visit http://beagleboard.org/discuss
> ---
> You received this message because you are subscribed to the Google Groups
> "BeagleBoard" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/beagleboard/fea7bb5c-f39f-458e-a359-8752f8b6c143%40googlegroups.com
> <https://groups.google.com/d/msgid/beagleboard/fea7bb5c-f39f-458e-a359-8752f8b6c143%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>


-- 
Yiling Cao
http://ariaboard.com/
http://shanghainovotech.com/

-- 
For more options, visit http://beagleboard.org/discuss
--- 
You received this message because you are subscribed to the Google Groups 
"BeagleBoard" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/beagleboard/CAMMUdpJVgAGrJGe3Lwu%3DLBpe5sqRqcyBjYK%2BYHrLSfspRij9Ug%40mail.gmail.com.

Reply via email to