On 28/12/2018 07.53, Johannes Schauer wrote: > I don't think that's likely because: > > - the hashes are reproducible across multiple runs if the same method was > used
If you run all your tests on the same filesystem, you will get the same filesystem readdir order, which can make results appear reproducible, when they are not really. Try running on different ext4 FSes with dir_index enabled (dir_index is default on). Running in different subdirs on one ext4 does not suffice, because it uses the same hash-seed. I looked at the diffoscope output and there are definitely ordering issues visible in the beginning around "link" and "element" and later around "cdata" Further down you also see many small 1-byte diffs similar to what I had observed with python-3.6 so these could be such variations in python's internal refcounts Might not matter: We run all our builds with export PYTHONHASHSEED=0 so that we never get ordering issues from randomized python hashes.