On 1/21/24 02:45, Tim Woodall wrote:
On Sat, 20 Jan 2024, David Christensen wrote:
On 1/20/24 08:25, Tim Woodall wrote:
Some time ago I wrote about a data corruption issue. I've still not
managed to track it down ...

Please post a console session that demonstrates, or at least documents, the data corruption.

Console session is difficult - this is a script that takes around 6
hours to run - but a typical example of corruption is something like
this:

Preparing to unpack .../03-libperl5.34_5.34.0-5_arm64.deb ...
Unpacking libperl5.34:arm64 (5.34.0-5) ...
dpkg-deb (subprocess): decompressing archive '/tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb' (size=4015516) member 'data.tar': lzma error: compressed data is corrupt
dpkg-deb: error: <decompress> subprocess returned error exit status 2
dpkg: error processing archive /tmp/apt-dpkg-install-zqY3js/03-libperl5.34_5.34.0-5_arm64.deb (--unpack):  cannot copy extracted data for './usr/lib/aarch64-linux-gnu/libperl.so.5.34.0' to '/usr/lib/aarch64-linux-gnu/libperl.so.5.34.0.dpkg-new': unexpected end of file or stream

The checksum will have been verified by apt during the download but when
it comes to read the downloaded deb to unpack and install it doesn't get
the same data. The corruption can happen at both the writing (the file
on disk is corrupted) and the reading (the file on disk has the correct
checksum)


Suggestions:

1. Use the -e (errexit), -u (nounset), and/or -x (xtrace) options; or their equivalents, if you are not using Bourne shell.

2. Add printf's to dump progress and debugging information to a file while the script runs.

3. Add assertions to check for disk corruption, performance problems, and any other else that concerns you; now or in the past. If any assertion fails, the assertion should identify itself, halt the script, and dump the relevant debugging information.

4. Refactor your code into a hierarchy (directed acyclic graph). Start your debugging/ validation at the bottom (leaf nodes; functions, commands) and work your way up (root node; the 6 hour script).

5. Make the script idempotent, so that when it fails and you run it again the script will skip over previously completed steps.


David

Reply via email to