Raymond Hettinger <[email protected]> added the comment:
Messages (3)
msg309956 - (view) Author: Johnny Dude (JohnnyD) Date: 2018-01-15 01:08
When using a tuple that include a string the results are not consistent when
invoking a new interpreter or process.
For example executing the following on a linux machine will yield different
results:
python3.6 -c 'import random; random.seed(("a", 1)); print(random.random())"
Please note that the doc string of random.seed states: "Initialize internal
state from hashable object."
Python documentation does not.
(https://docs.python.org/3.6/library/random.html#random.seed)
This is very confusing, I hope you can fix the behavior, not the doc string.
msg309957 - (view) Author: STINNER Victor (vstinner) * (Python committer)
Date: 2018-01-15 01:13
random.seed(str) uses:
if version == 2 and isinstance(a, (str, bytes, bytearray)):
if isinstance(a, str):
a = a.encode()
a += _sha512(a).digest()
a = int.from_bytes(a, 'big')
Whereas for other types, random.seed(obj) uses hash(obj), and hash is
randomized by default in Python 3.
Yeah, the random.seed() documentation should describe the implementation and
explain that hash(obj) is used and that the hash function is randomized by
default:
https://docs.python.org/dev/library/random.html#random.seed
msg310006 - (view) Author: Raymond Hettinger (rhettinger) * (Python
committer) Date: 2018-01-15 10:41
I'm getting a nice improvement in dispersion statistics by shuffling in higher
bits right at the end:
/* Disperse patterns arising in nested frozensets */
+ hash ^= (hash >> 11) ^ (~hash >> 25);
hash = hash * 69069U + 907133923UL;
Results for range() check:
range range
baseline new
1st percentile 35.06% 40.63%
1st decile 48.03% 51.34%
mean 61.47% 63.24%
median 63.24% 65.58%
Test code for the letter_range() test:
letter letter
baseline new
1st percentile 39.59% 40.14%
1st decile 50.90% 51.07%
mean 63.02% 63.04%
median 65.21% 65.23%
def letter_range(n):
return string.ascii_letters[:n]
def powerset(s):
for i in range(len(s)+1):
yield from map(frozenset, itertools.combinations(s, i))
# range() check
for i in range(10000):
for n in range(5, 19):
t = 2 ** n
mask = t - 1
u = len({h & mask for h in map(hash, powerset(range(i, i+n)))})
print(u/t*100)
# letter_range() check needs to be restarted (reseeded on every run)
for n in range(5, 19):
t = 2 ** n
mask = t - 1
u = len({h & mask for h in map(hash, powerset(letter_range(n)))})
print(u/t)
----------
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue26163>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com