Tim Peters added the comment:
Raymond, I share your concerns. There's no reason at all to make gratuitous
changes (like dropping the "post-addition of a constant and incorporating
length signature"), apart from that there's no apparent reason for them
existing to begin
Tim Peters added the comment:
Oh, I don't agree that it's "broken" either. There's still no real-world test
case here demonstrating catastrophic behavior, neither even a contrived test
case demonstrating that, nor a coherent characterization of what "the proble
Tim Peters added the comment:
Has anyone figured out the real source of the degeneration when mixing in
negative integers? I have not. XOR always permutes the hash range - it's
one-to-one. No possible outputs are lost, and XOR with a negative int isn't
"obviously degener
Tim Peters added the comment:
[Raymond, on boosting the multiplier on 64-bit boxes]
> Yes, that would be perfectly reasonable (though to some
> extent the objects in the tuple also share some of the
> responsibility for getting all bits into play).
It's of value independent of
Tim Peters added the comment:
FYI, using this for the guts of the tuple hash works well on everything we've
discussed. In particular, no collisions in the current test_tuple hash test,
and none either in the cases mixing negative and positive little ints. This
all remains so usin
Tim Peters added the comment:
BTW, those tests were all done under a 64-bit build. Some differences in a
32-bit build:
1. The test_tuple hash test started with 6 collisions. With the change, it
went down to 4. Also changing to the FNV-1a 32-bit multiplier boosted it to 8.
The test
Tim Peters added the comment:
> when you do t ^= t << 7, then you are not changing
> the lower 7 bits at all.
I want to leave low-order hash bits alone. That's deliberate.
The most important tuple component types, for tuples that are hashable, are
strings and contiguous ra
Tim Peters added the comment:
Jeroen, I understood the part about -2 from your initial report ;-) That's why
the last code I posted didn't use -2 at all (neither -1, which hashes to -2).
None of the very many colliding tuples contained -2 in any form. For example,
these 8 tuple
Tim Peters added the comment:
> advantage of my approach is that high-order bits become more
> important:
I don't much care about high-order bits, beyond that we don't systematically
_lose_ them. The dict and set lookup routines have their own strategies for
incorporating
Tim Peters added the comment:
Just noting that this Bernstein-like variant appears to work as well as the
FNV-1a version in all the goofy ;-) endcase tests I've accumulated:
while (--len >= 0) {
y = PyObject_Hash(*p++);
if (y == -1)
r
Tim Peters added the comment:
And one more:
x = (x * mult) ^ t;
also appears to work equally well. So, way back when, it appears we _could_
have wormed around the disaster du jour just by applying a shift-xor
permutation to the raw hash results.
Note the implication: if we
Tim Peters added the comment:
> Suppose that there is a hash collision, say hash((3, 3)) ==
> hash((-3, -3)) and you change the hashing algorithm to fix
> this collision.
There are _two_ hash functions at play in that collision: the tuple hash
function, and the integer hash
Tim Peters added the comment:
>> j is even implies (j ^ -3) == -(j ^ 3)
> This follows from what I posted before: if j is even, then
> j ^ 3 is odd, so we can apply the rule x ^ -2 = -x to x = j ^ 3
> ...
Thanks! That helps a lot. I had a blind spot there. This kind of th
Tim Peters added the comment:
High-order bit: please restore the original tuple hash test. You have the
worst case of "but I didn't write it" I've ever encountered ;-) Your new test
is valuable, but I've seen several cases now where it fails to detect any
problem
Tim Peters added the comment:
>> The two-liner above with the xor in the second line is
>> exactly Bernstein 33A, followed by a permutation
>> of 33A's _output_ space.
> Not output space, but internal state
? 33A's output _is_ its internal state at the e
Tim Peters added the comment:
I should have spelled this out before: these are all permutations, so in
general permuting the result space of `x * mult + y` (or any other permutation
involving x and y) is exactly the same as not permuting it but applying a
different permutation to y instead
Tim Peters added the comment:
Perhaps worth noting that FNV-1a works great if used as _intended_: on a
stream of unsigned bytes. All tests except the new tuple hash test suffer no
collisions; the new test suffers 14. Nothing is needed to try to worm around
nested tuple catastrophes, or
Tim Peters added the comment:
Also worth noting: other projects need to combine hashes too. Here's a 64-bit
version of the highly regarded C++ Boost[1] library's hash_combine() function
(I replaced its 32-bit int literal with "a random" 64-bit one):
Tim Peters added the comment:
[Tim]
> Perhaps worth noting that FNV-1a works great if used as
> _intended_: on a stream of unsigned bytes.
> ...
>
>Py_uhash_t t = (Py_uhash_t)y;
>for (int i = 0; i < sizeof(t); ++i) {
>x = (x ^ (t & 0xff)) *
Tim Peters added the comment:
Jeroen, thanks for helping us fly slightly less blind! ;-) It's a lot of work.
I'd say you may as well pick a prime. It's folklore, but "a reason" is that
you've discovered that regular bit patterns in multipliers can hurt, and
Tim Peters added the comment:
An "aha!" moment for me: I noted before that replacing the tuple hash loop with
Py_uhash_t t = (Py_uhash_t)y;
t ^= t << 1;
x = (x * mult) + t;
does OK (64-bit build): most tests had no collisions, but the original tuple
test
Tim Peters added the comment:
Noting that the failure of high-order bits to propagate is one form of
"avalanche" failure. A recent 64-bit hash, SeaHash, takes this approach which
is said to have provably "perfect" avalanche behavior:
Py_uhash_t t = (Py_uhash_t)y;
Tim Peters added the comment:
If SeaHash is interesting to us (it is to me), I suggest dropping our DJB/FNV
combining part entirely and using a purer form, like so:
Py_uhash_t t = (Py_uhash_t)y;
x ^= t ^ (t << 1);
x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
x ^= (x >&g
Tim Peters added the comment:
>Py_uhash_t t = (Py_uhash_t)y;
>x ^= t ^ (t << 1);
>x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
>x ^= (x >> 32) >> (x >> 60);
>x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
Comment out either one of the multip
Tim Peters added the comment:
A meta-comment: a great deal of work has been done on non-crypto hashes in
recent years. I'm not interested in rolling our own if one of the "winners"
from that process can be adapted. For that reason, I've only been looking at
those
Tim Peters added the comment:
>> the author wants this transformation to be easily
>> invertible, so a prime is necessary
> A multiplication by any odd number modulo 2**64 is
> invertible.
Right! My mistake.
> As I argued before, the concept of primes is
> mea
Tim Peters added the comment:
> So my suggestion remains
>
> for y in INPUT:
>t = hash(y)
>t ^= t * SOME_LARGE_EVEN_NUMBER
>h ^= t
>h *= MULTIPLIER
On the face of it, I'd be amazed if that passed SMHasher, because it only
propagates bits "to th
Tim Peters added the comment:
> If we, e.g., tested tuples of little floats instead ...
Speaking of which:
>>> from itertools import product
>>> len(set(map(hash, product([0.5, 0.25], repeat=20
32
32 hash codes out of 1048576 distinct two-tuples isn't exa
Tim Peters added the comment:
>> I've noted before, e.g., that sticking to a prime
>> eliminates a world of regular bit patterns in the
>> multiplier.
> Why do you think this? 0x1fff is prime :-)
Point taken ;-) But "a world of" is not the s
Tim Peters added the comment:
>> For that reason, I've only been looking at those that
>> scored 10 (best possible) on Appleby's SMHasher[1] test suite
> Do you have a list of such hash functions?
A world of searching. The ones rated "Excellent" here (pre
New submission from Tim McDonough :
I found an odd behavior that seems to be limited to Python 3.6.3.
Python 3.6.3 command scripts seem to prefer wrapping in double quotes instead
of single quotes.
Here is an example of the error.
$ echo -n '{"A":"a"}' |
Tim Peters added the comment:
> >>> from itertools import product
> >>> len(set(map(hash, product([0.5, 0.25], repeat=20
> 32
> Good catch! Would you like me to add this to the testsuite?
It's in mine already ;-) I've added all the "bad ex
Tim Peters added the comment:
> So it seems that this SMHasher test suite doesn't
> catch the problem that we're seeing with negative integers.
Seems to be so, but I've never run SMHasher myself. I believe it's focused on
statistical properties, like aval
Tim Peters added the comment:
FYI, this appears to be a good overview of what SMHasher is looking for:
https://github.com/aappleby/smhasher/wiki/SMHasher
Someg of everything: performance, correctness, statistical measures, and
specific kinds of keysets that have proved problematic for other
Tim Peters added the comment:
> SeaHash seems to be designed for 64 bits.
Absolutely.
> I'm guessing that replacing the shifts by
>
> x ^= ((x >> 16) >> (x >> 29))
>
> would be what you'd do for a 32-bit hash.
My guess too. But "the pri
New submission from Tim Hutt :
See this issue: https://bugs.llvm.org/show_bug.cgi?id=38974
Basically if you have a copy of Python 2 installed on OSX from brew (as well as
the system Python 2), then when you run lldb (which links to the system Python
2), at some point Python gets confused and
Tim Peters added the comment:
Thanks for looking at xxHash! An advantage is that it already comes in 32- and
64-bit versions.
> A (simplified and slightly modified version of) xxHash
> seems to work very well, much better than SeaHash.
? I've posted several SeaHash cores tha
Tim Peters added the comment:
Here's a complete xxHash-based implementation via throwing away C++-isms from
https://github.com/RedSpah/xxhash_cpp/blob/master/xxhash/xxhash.hpp
In the 64-bit build there are no collisions across my tests except for 11 in
the new tuple test.
The 32-bit
Tim McDonough added the comment:
Yes, there are wrapper scripts on my system. My system was updated from 3.3 to
3.6. The 3.3 and 3.6 wrappers are equivalent and similar to the python2.7
wrapper as shown:
exec -a `dirname $realpath`/python2.7 `dirname $realpath`/python2.7.real "$@&
Tim Peters added the comment:
> Note that I'm always considering parametrized
> versions of the hash functions that I'm testing.
> I'm replacing the fixed multiplier (all algorithms
> mentioned here have such a thing) by a random
> multiplier which is 3 mod 8.
Tim Peters added the comment:
>> In the 64-bit build there are no collisions across my
>> tests except for 11 in the new tuple test.
> That's pretty bad actually. With 64 bits, you statistically
> expect something in the order of 10**-8 collisions. So
> what yo
Change by Tim Peters :
--
components: +Interpreter Core -XML
___
Python tracker
<https://bugs.python.org/issue34751>
___
___
Python-bugs-list mailing list
Unsub
Tim Peters added the comment:
>> people already wrote substantial test suites dedicated
>> to that sole purpose, and we should aim to be "mere
>> consumers" of functions that pass _those_ tests.
> There are hash functions that pass those tests which
> are
Tim Peters added the comment:
Add a comment along the lines you (Terry) suggested. Some people need to write
doctests that run under many versions of Python, so the info is still supremely
relevant to them.
--
nosy: +tim.peters
___
Python
Tim Peters added the comment:
Stephane, it's not deep. People who need to write doctests that work across N
versions of Python shouldn't need to read N versions of the documentation.
This is hardly unique to doctest. We routinely add "Changed in version m.n"
blurb
Tim Peters added the comment:
As a sanity check, I tried the 32-bit version of MurmurHash2 (version 3 doesn't
have a 32-bit version). All of my Python tests had collisions, and while the
new tuple test dropped to 15, product([0.5, 0.25], repeat=20) skyrocketed from
141 (32-bit xxHas
Tim Peters added the comment:
Thinking about the way xxHash works prompted me to try this initialization
change:
x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new
That was a pure win in my tests, slashing the number of collisions in the new
tuple test, 32-bit build, from
Tim Peters added the comment:
Attaching htest.py so we have a common way to compare what various things do on
the tests that have been suggested. unittest sucks for that. doctest too.
Here's current code output from a 32-bit build; "ideally" we want "got" val
Tim Peters added the comment:
Here's htest.py output from
git pull https://github.com/jdemeyer/cpython.git bpo34751
and a variation. The number of collisions in the variation appear in square
brackets at the end of each line.
32-bit build:
range(100) by 3; 32-bit hash codes; mean 1
Tim Peters added the comment:
We need to worry about timing too :-( I'm using this as a test. It's very
heavy on using 3-tuples of little ints as dict keys. Getting hash codes for
ints is relatively cheap, and there's not much else going on, so this is
intended to be ve
Tim Peters added the comment:
>> Changes initialization to add in the length:
> What's the rationale for that change? You always asked
> me to stay as close as possible to the "official" hash function
> which adds in the length at the end. Is there an actual
Tim Burgess added the comment:
Hi! Just wondering if there is anything I can do to move this along?
--
___
Python tracker
<https://bugs.python.org/issue34
Tim Peters added the comment:
This comes up every few years, but that's about it. Here's the iteration from
2 years ago:
https://mail.python.org/pipermail/python-ideas/2016-October/043039.html
Follow the thread. It contains easy-to-use wrappers for both "do it in
multipl
Tim Peters added the comment:
This won't be changed. The effect on the iteration order of adding and/or
removing keys from a dict while iterating over it has never been defined in
Python, and never will be. The Python 3 docs for dict views say this clearly:
"""
I
Tim Peters added the comment:
Questions about Python should be asked, e.g., on the Python mailing list.
The short course is that it's desired that iterating over dicts be as fast as
possible, and nobody knows a way to make iteration robust in the face of
mutations that wouldn
Tim Peters added the comment:
Not without more expense. Which is why it hasn't been done. But this is a
wrong place to talk about it. If you want Python to change, go to the
python-ideas mailing list?
https://mail.python.org/mailman/listinfo/python-
Tim Graham added the comment:
I think this caused a behavior change:
Before (Python 3.6.6):
>>> from os.path import abspath
>>> abspath('/abc/')
'C:\\abc'
After (Python 3.6.7):
>>> abspath('/abc/')
'C:\\abc\\'
This causes a
Change by Tim Graham :
--
pull_requests: +9415
___
Python tracker
<https://bugs.python.org/issue33899>
___
___
Python-bugs-list mailing list
Unsubscribe:
Change by Tim Graham :
--
pull_requests: +9423
stage: needs patch -> patch review
___
Python tracker
<https://bugs.python.org/issue31047>
___
___
Python-
Change by Tim Graham :
--
pull_requests: +9424
___
Python tracker
<https://bugs.python.org/issue33899>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
Sorry, I find this too hard to follow. At the end:
> ['TA', 'CA'] # Missing the substring 'CA'
the comment claims it's missing 'CA', while the output plainly shows it's _not_
missing 'CA'.
Tim Peters added the comment:
I don't object to spelling it out more, and your (Terry's) suggestions are
fine. On the other hand, this module has been around for a lng time now,
and this is the first instance I've seen of someone surprised that it doesn't
produce
Tim Peters added the comment:
This doesn't actually matter - the code can never trigger. It would be fine to
replace it with an assert to that effect (see below for a specific suggestion).
The reason: The indices in this code are into vectors of PyObject*. These
vectors can'
Tim Peters added the comment:
I left the code in because it was harmless (a 100%-predictable branch), and it
was easier to show that overflow was _considered_ than to explain in full why
it was impossible. In the context of CPython. For example, the Java port of
this code couldn't re
Tim Golden added the comment:
I'm afraid you'll have to use English in this forum so that all current and
future readers have the best chance of understanding the situation. Thank you
very much for making the effort this far.
If anyone on this issue knows of a Chinese-language f
Tim Golden added the comment:
My initial reaction is that, whether the 2.7 behaviour is faulty or not, I
can't reproduce the "correct" behaviour on any version of Windows going back to
2.4. Take the attached Python file issue18040.py and run
"c:\pythonxx\python.exe -i is
Tim Golden added the comment:
Correction: I see the desired behaviour in 3.3/3.4 which is where the
overhaul to Ctrl-C handling on Windows was applied. I still can't see it
in 2.6 or in 3.1/3.2 on Windows.
The problem lies in the fact that PyOS_InterruptOccurred and
PyErr_CheckSignals
Tim Golden added the comment:
Personally, I'm +0 at best on this change. It would achieve consistency with
Linux but I'm not sure what you'd do with such functionality.
Adding Richard Oudkerk who did the rework of the interrupt signal for 3.3.
Richard, any opinion on this?
Tim Golden added the comment:
Thanks for the feedback, David. Closing as won't fix.
--
resolution: -> wont fix
stage: patch review -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.pyth
Tim Golden added the comment:
Glancing back, it isn't perhaps clear to the casual reader what's being
proposed here, and why. The idea is that a pip-style installer become part of
core Python. For Windows users, any standalone scripts from an installed
package would be placed in scr
Tim Peters added the comment:
As section 2.4.1. (String literals) of the Python reference manual says,
"... (even a raw string cannot end in an odd number of backslashes).
Specifically, a raw string cannot end in a single backslash (since the
backslash would escape the following
Tim Golden added the comment:
Really this should be a wont-fix: the fact that it's possible to import
WindowsError from shutil (on Linux) is an accident of its internal
implementation. It's certainly not documented as such.
Saurabh: WindowsError is a builtin on Windows. If you wan
Tim Golden added the comment:
The Ctrl-C handling in Python on Windows is a bit strange in places. I'll add
this to my list of things to look at. If you'd care to walk through the code to
produce a patch or at least to point to suspect code, that would make it more
likely that i
Tim Golden added the comment:
I put a bit of work in on this this morning, following Mark's suggestion
(msg138197) since that's the "canonical" approach. Unfortunately, it completely
fails to work for the most common case: the root folder of a drive! The
documentati
Tim Golden added the comment:
Thanks for doing the investigation. Yes, that comment was added by me
as part of the fix for issue1677. I'll try to have a look at the
codepath you describe to see if we can add a similar workaround. The
Ctrl-C / SIGINT handling on Windows is less than ide
Tim Golden added the comment:
issue9035.2.patch is an updated version of Atsuo's patch.
Known issues:
* I haven't reworked it for the new memory-management API
* There's no test for a non-root mount point (which is really the basis for
this issue). It's difficult to see
Tim Golden added the comment:
issue9035.3.patch has switched to the new memory management API and has
tweaked the tests slightly for robustness.
This approach does introduce a behavioural change: the root of a SUBSTed
drive (essentially a symlink into the Dos namespace) will raise an
OSError
Changes by Tim Golden :
Removed file: http://bugs.python.org/file31092/issue9035.3.patch
___
Python tracker
<http://bugs.python.org/issue9035>
___
___
Python-bugs-list m
Changes by Tim Golden :
Removed file: http://bugs.python.org/file31087/issue9035.2.patch
___
Python tracker
<http://bugs.python.org/issue9035>
___
___
Python-bugs-list m
Tim Golden added the comment:
4th and hopefully final patch. Added tests for byte paths. Reworked the ismount
so it uses the original detection approach first (which is wholly lexical) and
drops back to the volume path technique only if the path doesn't appear to be a
drive or a share
Changes by Tim Golden :
--
assignee: tim.golden ->
___
Python tracker
<http://bugs.python.org/issue4708>
___
___
Python-bugs-list mailing list
Unsubscri
Tim Golden added the comment:
This one seems to have been fixed by the importlib rebuild. I haven't bothered
to trace the code path, but certainly "import nul" returns the expected
"ImportError: No module named 'nul'" in both Debug & Release builds.
Changes by Tim Golden :
--
assignee: tim.golden ->
___
Python tracker
<http://bugs.python.org/issue2889>
___
___
Python-bugs-list mailing list
Unsubscri
Changes by Tim Golden :
--
resolution: -> fixed
status: open -> closed
___
Python tracker
<http://bugs.python.org/issue16921>
___
___
Python-bugs-list
Tim Golden added the comment:
This has been covered off by work done with the test.support package including
context managers for temporary files / directories, and a waitfor mechanism
which waits some time if a file can't be accessed.
--
resolution: -> works for me
stat
Tim Peters added the comment:
Looks fine to me. The third one uses native size and alignment (see the "Byte
Order, Size, and Alignment" section of the struct docs). After the first 5
B's, a pad byte is inserted so that the next H is properly aligned (to a 2-byte
boundary)
Tim Golden added the comment:
I propose to close this one: using Python 3.3 on Win7 I can successfully stat
NTFS Junctions. Is there any remaining issue?
--
___
Python tracker
<http://bugs.python.org/issue18
Tim Golden added the comment:
Fixed. Thanks for the patch
--
resolution: -> fixed
stage: needs patch -> committed/rejected
status: open -> closed
___
Python tracker
<http://bugs.python.o
Tim Peters added the comment:
Well, a timedelta is a duration. timedelta // n is as close as possible to one
n'th of that duration, but "rounding down" (if necessary) so that the result is
representable as a timedelta. In the same way, if i and j are integers, i // j
is as cl
Changes by Tim Peters :
--
nosy: +tim_one
___
Python tracker
<http://bugs.python.org/issue18647>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
Serhiy, I don't see the regexp '(?:.*$\n?)*' anywhere in doctest.py. Are you
talking about the _EXAMPLE_RE regexp? That's the closest I see.
If that's the case, the "nothing to repeat" error is incorrect: _EXAMPLE_RE
Tim Peters added the comment:
Serhiy, I'm asking you to be very explicit about which regexp in doctest.py
you're talking about. If you're talking about the _EXAMPLE_RE regexp, I
already explained what's going on with that. If you're talking about some
other regexp,
Tim Peters added the comment:
I'm afraid it's just too tricky for the code to deduce that a negative
lookahead assertion can imply that a later match can't be empty. But I don't
know how smart the re compilation code already is ;-)
It occurs to me now that the docte
Tim Peters added the comment:
Matching an empty string an unbounded number of times isn't a case of
exponential runtime, it's a case of infinite runtime, unless the regexp
internals are smart enough to cut the search off (I don't know enough about
re's internals to kno
Tim Peters added the comment:
FWIW, I like this. It would be a nice addition to the itertools module. +1
The `key` argument should be renamed to `pred`, as others have said.
As to the name, I like "first_true". Does what it says. Plain "first" is
misleading, an
Tim Peters added the comment:
Matthew, yes, I agree that a regexp engine can be coded to be robust against
unboundedly repeated matching of an empty string. I don't know whether
Python's current engine is so coded. It's easy to trick the 2.7 engine into
accepting regexps
Tim Peters added the comment:
> skipping the underscore ("firsttrue" or "nexttrue")
> would arguably be more consistent with other itertools names.
Like izip_longest and combinations_with_replacement? ;-)
I care about readability here more than consistency with th
Tim Peters added the comment:
> Python's current regex engine isn't so coded. That's
> the reason for the up-front check.
It's peculiar then that nobody noticed before now that the check was so badly
broken ;-)
Offhand, do you have an example that displays bad beh
Tim Golden added the comment:
Here's an updated patch against trunk with tests & doc changes
--
status: languishing -> open
Added file: http://bugs.python.org/file31165/issue2528.2.patch
___
Python tracker
<http://bugs.pytho
Tim Golden added the comment:
... and to answer Amaury's question in msg109871 it creates a reasonable
consistency between the results of os.access and the user's actual ability to
read / write a file. eg, you might have no permissions whatsoever on the file
but as long as it wasn
1201 - 1300 of 2346 matches
Mail list logo