[issue44262] tarfile: some content different output

2021-05-28 Thread Vasco Gervasi


New submission from Vasco Gervasi :

Hi,
I am seeing some irregularities on the the tar files created using python.

Consider the attached script.
This is the output from the scripts:
```
  # gz
b'0f2eb7b3cac63267b1cf51d2bd5e3144f53cc5b172bbad3dccd5adf4ffb2d220  
/tmp/py.gz\n'
9bde8fdb44d98c5a838a9fedaff6e66cd536d91022f8a64a6ecc514f38ce01af
b'e37c3d30ae3c12e872c6aade55ac0a40da8b3f357ce8ed77287bc9f8f024e587  
/tmp/py.gz\n'
7ac976e3c94b90abff3c4138a2d153e9be9cc87e2b5a97baf2be95ca04029936

  # bz2
b'd04678e749491e4de1065d3f72ba66395d6bd8ffba3d6360ed9ca2c514586fd3  
/tmp/py.bz2\n'
9aa293624df8c40f47614045602af41cc603ca92c97c94926296ef38396d6e3f
b'd04678e749491e4de1065d3f72ba66395d6bd8ffba3d6360ed9ca2c514586fd3  
/tmp/py.bz2\n'
9aa293624df8c40f47614045602af41cc603ca92c97c94926296ef38396d6e3f

  # xz
b'a050baa1ab765fa037524ff061d59f62ad37bc6d1bacf98f9bff2f4b4c312fab  
/tmp/py.xz\n'
ca39f034d7812d2420573218c69313ac31fd516ffebe1a57f4e41a32e1e840b9
b'a050baa1ab765fa037524ff061d59f62ad37bc6d1bacf98f9bff2f4b4c312fab  
/tmp/py.xz\n'
ca39f034d7812d2420573218c69313ac31fd516ffebe1a57f4e41a32e1e840b9

b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/tar_a0.tgz\n'
b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/tar_a1.tgz\n'
b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/gzp_a0.tgz\n'
b'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855  
/tmp/gzp_a1.tgz\n'
```

As you can see the tar generated using the `tar` command are always same, 
instead the one generated using python are not.

Am I missing some arguments?

Thanks

--
components: Library (Lib)
files: compress.py
messages: 394666
nosy: yellowhat
priority: normal
severity: normal
status: open
title: tarfile: some content different output
type: behavior
Added file: https://bugs.python.org/file50070/compress.py

___
Python tracker 
<https://bugs.python.org/issue44262>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-30 Thread Vasco Gervasi

Vasco Gervasi  added the comment:

Dear Filipe,
thanks for your answer.
Following your suggestion, I have tried the attached file.

The output is:
$ python /data/compress.py
b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4  
/tmp/py_gzip.tgz\n'
b'68963e137ced6ee2aa5a93e155b201a3c172e2683d4b15a0eab7c1d8d43e48b4  
/tmp/py_gzip.tgz\n'
$ rm -rf a/
$ mv py_gzip.tgz py_gzip.tgz0
$ python /data/compress.py
b'9c897d82c332f0d5443fe66112abe5f318bf6e6574e44c5c3c385f398784ac35  
/tmp/py_gzip.tgz\n'
b'9c897d82c332f0d5443fe66112abe5f318bf6e6574e44c5c3c385f398784ac35  
/tmp/py_gzip.tgz\n'
$ diffoscope py_gzip.tgz0 py_gzip.tgz
--- py_gzip.tgz0
+++ py_gzip.tgz
│   --- py_gzip.tgz0-content
├── +++ py_gzip.tgz-content
│ ├── file list
│ │ @@ -1,4 +1,4 @@
│ │ -drwxr-xr-x   0 root (0) root (0)0 2021-05-30 
15:32:56.566535 a/
│ │ --rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:32:56.566535 a/eph0
│ │ --rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:32:56.566535 a/eph1
│ │ --rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:32:56.566535 a/eph2
│ │ +drwxr-xr-x   0 root (0) root (0)0 2021-05-30 
15:33:16.956535 a/
│ │ +-rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:33:16.956535 a/eph0
│ │ +-rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:33:16.956535 a/eph1
│ │ +-rw-r--r--   0 root (0) root (0)6 2021-05-30 
15:33:16.966535 a/eph2

Even if I am specifing an mtime, it is not correctly applied.

Thanks

--
Added file: https://bugs.python.org/file50073/compress.py

___
Python tracker 
<https://bugs.python.org/issue44262>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-30 Thread Vasco Gervasi


Vasco Gervasi  added the comment:

Dear Filipe,
sorry I did not explaing the use case, obiously this is a toy example to show 
my problem.
So I have pipeline, that from a repository generate a tar file, using a python 
script; if the hash of the tar file is different it will trigger other things.
As you can imagine each time the pipeline is run, the content is the same (if 
same commit) but the files timestamps are different and so the tar is different.

Thanks for pointing out that examples, I will check and let you know.

Thanks

--

___
Python tracker 
<https://bugs.python.org/issue44262>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44262] tarfile: some content different output

2021-05-31 Thread Vasco Gervasi


Vasco Gervasi  added the comment:

Yes, you can close it.

For future reference:

tar_reset = "/tmp/py_tar_reset.tar"

def reset(tarinfo):
tarinfo.uid = tarinfo.gid = 0
tarinfo.uname = tarinfo.gname = "root"
tarinfo.mtime = 1
return tarinfo

with tarfile.open(tar_reset, "w:xz") as tar_obj:
tar_obj.add("/tmp/a", arcname="a", filter=reset)

--

___
Python tracker 
<https://bugs.python.org/issue44262>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com