[issue44262] tarfile: some content different output
New submission from Vasco Gervasi : Hi, I am seeing some irregularities on the the tar files created using python. Consider the attached script. This is the output from the scripts: ``` # gz b'0f2eb7b3cac63267b1cf51d2bd5e3144f53cc5b172bbad3dccd5adf4ffb2d220 /tmp/py.gz\n' 9bde8fdb44d98c5a838a9fedaff6e66cd536d91022f8a64a6ecc514f38ce01af b'e37c3d30ae3c12e872c6aade55ac0a40da8b3f357ce8ed77287bc9f8f024e587 /tmp/py.gz\n' 7ac976e3c94b90abff3c4138a2d153e9be9cc87e2b5a97baf2be95ca04029936 # bz2 b'd04678e749491e4de1065d3f72ba66395d6bd8ffba3d6360ed9ca2c514586fd3 /tmp/py.bz2\n' 9aa293624df8c40f47614045602af41cc603ca92c97c94926296ef38396d6e3f b'd04678e749491e4de1065d3f72ba66395d6bd8ffba3d6360ed9ca2c514586fd3 /tmp/py.bz2\n' 9aa293624df8c40f47614045602af41cc603ca92c97c94926296ef38396d6e3f # xz b'a050baa1ab765fa037524ff061d59f62ad37bc6d1bacf98f9bff2f4b4c312fab /tmp/py.xz\n' ca39f034d7812d2420573218c69313ac31fd516ffebe1a57f4e41a32e1e840b9 b'a050baa1ab765fa037524ff061d59f62ad37bc6d1bacf98f9bff2f4b4c312fab /tmp/py.xz\n' ca39f034d7812d2420573218c69313ac31fd516ffebe1a57f4e41a32e1e840b9 b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 /tmp/tar_a0.tgz\n' b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 /tmp/tar_a1.tgz\n' b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 /tmp/gzp_a0.tgz\n' b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 /tmp/gzp_a1.tgz\n' ``` As you can see the tar generated using the `tar` command are always same, instead the one generated using python are not. Am I missing some arguments? Thanks -- components: Library (Lib) files: compress.py messages: 394666 nosy: yellowhat priority: normal severity: normal status: open title: tarfile: some content different output type: behavior Added file: https://bugs.python.org/file50070/compress.py ___ Python tracker <https://bugs.python.org/issue44262> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44262] tarfile: some content different output
Vasco Gervasi added the comment: Dear Filipe, thanks for your answer. Following your suggestion, I have tried the attached file. The output is: $ python /data/compress.py b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4 /tmp/py_gzip.tgz\n' b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4 /tmp/py_gzip.tgz\n' $ rm -rf a/ $ mv py_gzip.tgz py_gzip.tgz0 $ python /data/compress.py b'9c897d82c332f0d5443fe66112abe5f318bf6e6574e44c5c3c385f398784ac35 /tmp/py_gzip.tgz\n' b'9c897d82c332f0d5443fe66112abe5f318bf6e6574e44c5c3c385f398784ac35 /tmp/py_gzip.tgz\n' $ diffoscope py_gzip.tgz0 py_gzip.tgz --- py_gzip.tgz0 +++ py_gzip.tgz │ --- py_gzip.tgz0-content ├── +++ py_gzip.tgz-content │ ├── file list │ │ @@ -1,4 +1,4 @@ │ │ -drwxr-xr-x 0 root (0) root (0)0 2021-05-30 15:32:56.566535 a/ │ │ --rw-r--r-- 0 root (0) root (0)6 2021-05-30 15:32:56.566535 a/eph0 │ │ --rw-r--r-- 0 root (0) root (0)6 2021-05-30 15:32:56.566535 a/eph1 │ │ --rw-r--r-- 0 root (0) root (0)6 2021-05-30 15:32:56.566535 a/eph2 │ │ +drwxr-xr-x 0 root (0) root (0)0 2021-05-30 15:33:16.956535 a/ │ │ +-rw-r--r-- 0 root (0) root (0)6 2021-05-30 15:33:16.956535 a/eph0 │ │ +-rw-r--r-- 0 root (0) root (0)6 2021-05-30 15:33:16.956535 a/eph1 │ │ +-rw-r--r-- 0 root (0) root (0)6 2021-05-30 15:33:16.966535 a/eph2 Even if I am specifing an mtime, it is not correctly applied. Thanks -- Added file: https://bugs.python.org/file50073/compress.py ___ Python tracker <https://bugs.python.org/issue44262> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44262] tarfile: some content different output
Vasco Gervasi added the comment: Dear Filipe, sorry I did not explaing the use case, obiously this is a toy example to show my problem. So I have pipeline, that from a repository generate a tar file, using a python script; if the hash of the tar file is different it will trigger other things. As you can imagine each time the pipeline is run, the content is the same (if same commit) but the files timestamps are different and so the tar is different. Thanks for pointing out that examples, I will check and let you know. Thanks -- ___ Python tracker <https://bugs.python.org/issue44262> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44262] tarfile: some content different output
Vasco Gervasi added the comment: Yes, you can close it. For future reference: tar_reset = "/tmp/py_tar_reset.tar" def reset(tarinfo): tarinfo.uid = tarinfo.gid = 0 tarinfo.uname = tarinfo.gname = "root" tarinfo.mtime = 1 return tarinfo with tarfile.open(tar_reset, "w:xz") as tar_obj: tar_obj.add("/tmp/a", arcname="a", filter=reset) -- ___ Python tracker <https://bugs.python.org/issue44262> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com