[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-09 Thread Tim Peters
Tim Peters added the comment: The attached fastsearch.diff removes the speed hit in the original test case and in the constructed one. I don't know whether it "should be" applied, and really can't make time to dig into it. The rationale: when the last characters of th

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-10 Thread Tim Peters
Tim Peters added the comment: Ya, the case for the diff is at best marginal. Note that while it may be theoretically provable that the extra test would make the worst cases slower, that's almost certainly not measurable. The extra test would almost never be executed in the worst case

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-11 Thread Tim Peters
Tim Peters added the comment: Impressive, Dennis! Nice work. FYI, on the OP's original test data, your fastsearch() completes each search in under 20 seconds using CPython, and in well under 1 second using PyPy. Unfortunately, that's so promising it can't just be dismis

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Tim Peters
Tim Peters added the comment: > For a bunch of cases it's slower, for some others it's faster. I have scant real idea what you're doing, but in the output you showed 4 output lines are labelled "slower" but 18 are labelled "faster". What you wrote just

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Tim Peters
Tim Peters added the comment: Dennis, would it be possible to isolate some of the cases with more extreme results and run them repeatedly under the same timing framework, as a test of how trustworthy the _framework_ is? From decades of bitter experience, most benchmarking efforts end up

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Tim Peters
Tim Peters added the comment: Dennis, I'm delighted that the timing harness pointed out an actual glitch, and that it was so (seemingly) straightforward to identify the algorithmic cause. This gives me increased confidence that this project can be pushed to adoption, and your name wi

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-13 Thread Tim Peters
Tim Peters added the comment: > There's no discomfort at all to me if, e.g., it stored > 32-bit counts and is indexed by the last 6 bits of the > character. That's a measly 256 bytes in all. Or, for the same space, 16-bit counts indexed by the last 7 bits. Then there'

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-14 Thread Tim Peters
Tim Peters added the comment: For completeness, a link to the python-dev thread about this: https://mail.python.org/archives/list/python-...@python.org/thread/ECAZN35JCEE67ZVYHALRXDP4FILGR53Y/#4IEDAS5QAHF53IV5G3MRWPQAYBIOCWJ5 -- ___ Python tracker

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-16 Thread Tim Peters
Tim Peters added the comment: I don't think Rabin-Karp is worth trying here. The PR is provably worst-case linear time, and the constant factor is already reasonably small. Its only real weakness I can see is that it can be significantly (but seemingly not dramatically) slower tha

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-16 Thread Tim Peters
Tim Peters added the comment: Removed 3.8 from the Versions list. The code was functioning as designed, and the O(n*m) worst case bounds were always known to be possible, so there's no actual bug here. -- versions: -Python 3.8 ___ Python tr

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-16 Thread Tim Peters
Tim Peters added the comment: And changed the "Type" field to "performance", because speed is the only issue here. -- type: behavior -> performance ___ Python tracker <https://

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-16 Thread Tim Peters
Tim Peters added the comment: BTW, in the old post of Fredrick's I linked to, he referred to a "stringbench.py" program he used for timings, but the link is dead. I was surprised to find that it still lives on, under the Tools directory. Or did - I'm on Windows now a

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-17 Thread Tim Peters
Tim Peters added the comment: When I ran stringbench yesterday (or the day before - don't remember), almost all the benefit seemed to come from the "late match, 100 characters" tests. Seems similar for your run. Here are your results for just that batch, interleaving the t

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-17 Thread Tim Peters
Tim Peters added the comment: I don't think we have any idea how the OP stumbled into this. Looks like it "just happened". The case you construted is quadratic-time, but not quite as bad: BaB BB Fails at once, because 'a' doesn't match the

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-17 Thread Tim Peters
Tim Peters added the comment: Yup, they act essentially the same, but yours jumps into the quicksand earlier ;-) I'm fond of this one: """ HUGE = 10**7 BIG = 10**6 bigxs = 'x' * BIG haystack = 'x' * HUGE needle = bigxs + 'y' + bigxs "

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-18 Thread Tim Peters
Tim Peters added the comment: Dennis, I think that's expected, right? Two-way on its own can exploit nothing about individual characters - it only preprocesses the needle to break the possibility for quadratic-time behavior due to periods in the needle. It sounds like you switche

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-18 Thread Tim Peters
Tim Peters added the comment: I confess I _assumed_ all along that you were generalizing the current code's Sunday trick to 7-bit equivalence classes (up from 32 bits total) and 64K possible shift counts (up from just 2 total possibilities: 1 or len(needle)+1). The Sunday trick cou

[issue41972] bytes.find consistently hangs in a particular scenario

2020-10-22 Thread Tim Peters
Tim Peters added the comment: Note that Sunday doesn't care (at all) where mismatches occur. The "natural" way to add Sunday: follow pure C-P unless/until it finds a mismatching position. Pure C-P then computes a specific shift. Nothing about that changes. But something

[issue41972] bytes.find consistently hangs in a particular scenario

2020-11-06 Thread Tim Peters
Tim Peters added the comment: I'm sorry I haven't been able to give more time to this. I love what's been done, but am just overwhelmed :-( The main thing here is to end quadratic-time disasters, without doing significant damage in other cases. Toward that end it would be fin

[issue41972] bytes.find consistently hangs in a particular scenario

2020-11-06 Thread Tim Peters
Tim Peters added the comment: But that also suggests a different approach: start with the current code, but add introspection to switch to your enhancement of C&P if the current code is drifting out of linear-time territory. -- ___ Python tra

[issue42336] Make PCbuild/build.bat build x64 by default

2020-11-12 Thread Tim Peters
Tim Peters added the comment: +1. If you're feeling more ambitious, it would also be good to change build.bat and rt.bat to use the same "which platform?" spellings and with the same defaults. -- nosy: +tim.peters ___ Python

[issue42456] Logical Error

2020-11-24 Thread Tim Peters
Tim Peters added the comment: There's no bug here. "&" is the bitwise Boolean logical-and operator on integers. For example, >>> 1 & 2 0 >>> 1 & 3 1 It binds more tightly than the "==" equality-testing operator. To get the result you w

[issue13936] datetime.time(0, 0, 0) evaluates to False despite being a valid time

2012-02-03 Thread Tim Peters
Tim Peters added the comment: >From the docs, at: http://docs.python.org/library/datetime.html#time-objects """ in Boolean contexts, a time object is considered to be true if and only if, after converting it to minutes and subtracting utcoffset() (or 0 if that’s None),

[issue13936] datetime.time(0, 0, 0) evaluates to False despite being a valid time

2012-02-03 Thread Tim Peters
Tim Peters added the comment: It is odd, but really no odder than "zero values" of other types evaluating to false in Boolean contexts ;-) Closing as "invalid". -- resolution: -> invalid status: open -> closed ___

[issue37000] _randbelow_with_getrandbits function inefficient with powers of two

2019-05-22 Thread Tim Peters
Tim Peters added the comment: I believe the thrust of Mark's suggestion was that it would allow using `k = (n-1).bit_length()` even when n == 1, without special-casing n == 1. But you'd still be adding a new "subtract 1" operation, and would still change results in so

[issue37061] The strangest glitch I have ever seen - incorrect indenterror, even on commented out code.

2019-05-26 Thread Tim Peters
Tim Peters added the comment: As Steven said, there's an obvious indentation error in the file you actually attached. So nobody can _guess_ what your problem is. Please upload a file showing your actual problem. If I increase the indentation of the `print` Steven identified to matc

[issue37061] The strangest glitch I have ever seen - incorrect indenterror, even on commented out code.

2019-05-26 Thread Tim Peters
Tim Peters added the comment: Same kind of problem with the new upload, Bob. Line 38 is: print(Fore.GREEN,"Installed file ",e,Fore.WHITE) indented 8 spaces. Lines 39 and 40 are empty. Then line 41 is: if x>=len(files): # if x is more than or equal to the number of files in t

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-27 Thread Tim Peters
Change by Tim Peters : -- keywords: +patch pull_requests: +13516 stage: -> patch review pull_request: https://github.com/python/cpython/pull/13612 ___ Python tracker <https://bugs.python.org/issu

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-27 Thread Tim Peters
Tim Peters added the comment: Created a PR and assigned myself to this bug. -- assignee: -> tim.peters ___ Python tracker <https://bugs.python.org/issu

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-30 Thread Tim Peters
Tim Peters added the comment: Added file arena.py. This adds some code to the OP's original test, to print out build time and teardown time, and display obmalloc stats. You'll need at least 80GB of RAM to run it - I don't have that much. Building the tree may take on the o

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-31 Thread Tim Peters
Tim Peters added the comment: Thank you so much, Inada! That's very good to hear :-) -- ___ Python tracker <https://bugs.python.org/issue37029> ___ ___

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-31 Thread Tim Peters
Tim Peters added the comment: New changeset 1c263e39c4ed28225a7dc8ca1f24953225ac48ca by Tim Peters in branch 'master': bpo-37029: keep usable_arenas in sorted order without searching (#13612) https://github.com/python/cpython/commit/1c263e39c4ed28225a7dc8ca1f2495

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-05-31 Thread Tim Peters
Change by Tim Peters : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.or

[issue29882] Add an efficient popcount method for integers

2019-06-01 Thread Tim Peters
Tim Peters added the comment: I prefer that a negative int raise ValueError, but am OK with it using the absolute value instead (i.e., what it does now). -- ___ Python tracker <https://bugs.python.org/issue29

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-01 Thread Tim Peters
Tim Peters added the comment: Strongly prefer requiring 0 <= k <= n at first. This is a programming language: it will be applied to real problems, not to abstract proofs where some slop can be helpful in reducing the number of cases that need to be considered. The Twitter fel

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-02 Thread Tim Peters
Tim Peters added the comment: I'm not convinced, although I agree relaxing k <= n is less damaging than relaxing k >= 0. Python isn't aimed at mathematicians (although some 3rd-party packages certainly are, and they're free to define things however they like).

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-02 Thread Tim Peters
Tim Peters added the comment: I'm going to repeat part of an earlier comment :-) """ Please resist pointless feature creep. The original report was about comb(n, k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented appeared t

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-02 Thread Tim Peters
Tim Peters added the comment: I'm not fatally opposed to relaxing k <= n. David makes some good points about it, and - as Raymond already noted - "0" is consistent with the behavior of itertools.combinations(). The docs would need to change, though, because the factorial

[issue37132] Add a module for integer related math functions

2019-06-02 Thread Tim Peters
Tim Peters added the comment: Ya, I'm mostly with Raymond. `math` was originally a very thin wrapper around C's libm, but we've been moving away from that more and more for decades, and it's no longer the case anyway that the vast bulk of new Python programmers are inti

[issue35431] Add a function for computing binomial coefficients to the math module

2019-06-03 Thread Tim Peters
Tim Peters added the comment: Python needs a benevolent dictator ;-) I'd be happy for Mark, Raymond, or me to play that role here. If it were me, I'd drop the k <= n requirement: both arguments must be non-negative integers. Because a combinatorial (not factorial-, and

[issue37134] [EASY] Use PEP570 syntax in the documentation

2019-06-03 Thread Tim Peters
Tim Peters added the comment: I haven't looked at anything in this PR, so just popping in to confirm that the first time I saw stuff like: len(obj, /) in the docs I had no idea at all what it was trying to say. I thought it was a typo. Also the second, third, fourth, ..., times

[issue37029] PyObject_Free is O(N) where N = # of arenas

2019-06-03 Thread Tim Peters
Tim Peters added the comment: Thank you, Friedl! I appreciate the extra confirmation - and especially on even bigger test cases :-) -- ___ Python tracker <https://bugs.python.org/issue37

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Tim Peters
Tim Peters added the comment: Also basic: run hardware CPU and memory stress diagnostics, and/or try running the same thing on a different machine. Hardware isn't infallible, and can fail in nearly arbitrary ways. For example, perhaps a smidgen of silicon has gone flaky, so that one

[issue37178] One argument form of math.perm()

2019-06-06 Thread Tim Peters
Tim Peters added the comment: I agree: perm(n) should return factorial(n). -- ___ Python tracker <https://bugs.python.org/issue37178> ___ ___ Python-bugs-list m

[issue37211] obmalloc: eliminate limit on pool size

2019-06-09 Thread Tim Peters
New submission from Tim Peters : On 64-bit Python, many object sizes essentially doubled over 32-bit Python, because Python objects are so heavy with pointers. More recently, forcing alignment to 16 bytes on 64-bit boxes boosted the memory requirements more modestly. But obmalloc's 25

[issue37211] obmalloc: eliminate limit on pool size

2019-06-09 Thread Tim Peters
Change by Tim Peters : -- keywords: +patch pull_requests: +13801 stage: -> patch review pull_request: https://github.com/python/cpython/pull/13934 ___ Python tracker <https://bugs.python.org/issu

[issue37211] obmalloc: eliminate limit on pool size

2019-06-09 Thread Tim Peters
Change by Tim Peters : -- assignee: -> tim.peters ___ Python tracker <https://bugs.python.org/issue37211> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue37257] obmalloc: stop simple arena thrashing

2019-06-12 Thread Tim Peters
New submission from Tim Peters : Scenario: all arenas are fully used. A program then runs a loop like: while whatever: p = malloc(n) ... free(p) At the top, a new arena has to be created, and a single object is taken out of a single pool. At the bottom, that object is

[issue37257] obmalloc: stop simple arena thrashing

2019-06-12 Thread Tim Peters
Change by Tim Peters : -- type: -> performance ___ Python tracker <https://bugs.python.org/issue37257> ___ ___ Python-bugs-list mailing list Unsubscrib

[issue37257] obmalloc: stop simple arena thrashing

2019-06-12 Thread Tim Peters
Change by Tim Peters : -- keywords: +patch pull_requests: +13903 stage: test needed -> patch review pull_request: https://github.com/python/cpython/pull/14039 ___ Python tracker <https://bugs.python.org/issu

[issue37257] obmalloc: stop simple arena thrashing

2019-06-12 Thread Tim Peters
Tim Peters added the comment: New changeset d1c85a27ea9fe70163cad3443d5e534d94f08284 by Tim Peters in branch 'master': bpo-37257: obmalloc: stop simple arena thrashing (#14039) https://github.com/python/cpython/commit/d1c85a27ea9fe70163cad3443d5e53

[issue37257] obmalloc: stop simple arena thrashing

2019-06-12 Thread Tim Peters
Change by Tim Peters : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <https://bugs.python.or

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Tim Peters added the comment: Looks likely that the _major_ cause of the quadratic-time delete behavior was due to that obmalloc used a linear-time method to keep its linked list of usable arenas sorted in order of number of free pools. When a pool became unused, its arena's count of

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Change by Tim Peters : -- stage: resolved -> commit review ___ Python tracker <https://bugs.python.org/issue32846> ___ ___ Python-bugs-list mailing list Un

[issue32846] Deletion of large sets of strings is extra slow

2019-06-14 Thread Tim Peters
Tim Peters added the comment: Raymond, please read my very recent comment one above yours. A (overall) quadratic-time algorithm (O(A**2) where A is the number of arenas) in obmalloc.c is (to my eyes) probably the _primary_ cause of the sloth here. That's been fixed for 3.8, but I

[issue32846] Deletion of large sets of strings is extra slow

2019-06-15 Thread Tim Peters
Tim Peters added the comment: Thanks, Terry! Based on your latest results, "quadratic time" isn't plausible here anymore, so I'm closing this. Nasty cache effects certainly played a role, but they were just a flea on the dog ;-) -- resolution: -> fix

[issue37295] Possible optimizations for math.comb()

2019-06-17 Thread Tim Peters
Tim Peters added the comment: In real life, I expect 99.999%+ of calls will be made with small arguments, so (1) is worth it. I like Mark's suggestion to use uint64_t so the acceptable range doesn't depend on platform. At least in the world I live in, 32-bit boxes are all b

[issue37448] obmalloc: radix tree for tracking arena address ranges

2019-06-29 Thread Tim Peters
Change by Tim Peters : -- nosy: +tim.peters ___ Python tracker <https://bugs.python.org/issue37448> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37434] Segfault in typeobject.c at _PyObject_GC_UNTRACK(type)

2019-06-29 Thread Tim Peters
Tim Peters added the comment: I haven't used protobuf, but it's _generally_ true that crashes that occur for the first time in the presence of C or C++ extension modules are due to subtle (or not so subtle) mistakes in using the sometimes-delicate Python C API. So it's t

[issue37454] Clarify docs for math.log1p()

2019-07-01 Thread Tim Peters
Tim Peters added the comment: Mark's analysis is spot-on - good eye :-) Here under 3.7.3 [MSC v.1916 64 bit (AMD64)] on win32, in the original script it makes no difference at all for negative "small x" (where, as Mark said, `1 - random.random()` is exactly representable):

[issue37500] 3.8.0b2 no longer optimizes away "if 0:" ?

2019-07-05 Thread Tim Peters
Tim Peters added the comment: I hate this change :-( The code generated for something like this today: def f(): if 0: x = 1 elif 0: x = 2 elif 1: x = 3 elif 0: x = 4 else: x = 5 print(x) is the same as for: def f(): x = 3

[issue37500] 3.8.0b2 no longer optimizes away "if 0:" ?

2019-07-05 Thread Tim Peters
Tim Peters added the comment: There's "correctness" that matters and "correctness" that's merely pedantic ;-) CPython has acted the current way for about 15 years (since 2.4 was released), and this is the first time anyone has raised an objection. That'

[issue37500] 3.8.0b2 no longer optimizes away "if 0:" ?

2019-07-05 Thread Tim Peters
Tim Peters added the comment: > This is the expected result of fixing a bug that has been > open since 2008 It's the expected result of fixing a bug _by_ eliminating the optimization entirely. It's not an expected result of merely fixing the bug. It's quite obvious

[issue37500] 3.8.0b2 no longer optimizes away "if 0:" ?

2019-07-05 Thread Tim Peters
Tim Peters added the comment: > Using jumps is not removing the optimization > entirely, is just a weaker and more incomplete > way of doing the same. Sorry, I'm afraid I have no idea what that means. The generated code before and after was wildly different, as shown in

[issue37500] 3.8.0b2 no longer optimizes away "if 0:" ?

2019-07-05 Thread Tim Peters
Tim Peters added the comment: > we could say that it does not matter if > > def f(): > if 0: > yield > > should be or not a generator Slippery slope arguments play better if they're made _before_ a decade has passed after the slope was fully greased. There&

[issue37543] Optimize pymalloc for non PGO build

2019-07-10 Thread Tim Peters
Change by Tim Peters : -- nosy: +tim.peters ___ Python tracker <https://bugs.python.org/issue37543> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37537] Compute allocated blocks in _Py_GetAllocatedBlocks()

2019-07-10 Thread Tim Peters
Change by Tim Peters : -- nosy: +tim.peters ___ Python tracker <https://bugs.python.org/issue37537> ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue37727] error past 15 places

2019-07-30 Thread Tim Peters
Tim Peters added the comment: You'll see much the same in every programming language that supports your computer's floating-point hardware. Start by reading this gentle introduction: https://docs.python.org/3/tutorial/floatingpoint.html This bug tracker isn't a place for tu

[issue37787] Minimum denormal or ** bug

2019-08-07 Thread Tim Peters
Tim Peters added the comment: Python delegates exponentiation with a Python float result to the platform C's double precision `pow()` function. So this is just what the Windows C pow(2.0, -1075.0) returns. All native floating point operations are subject various kinds of error, and

[issue37004] SequenceMatcher.ratio() noncommutativity not well-documented

2019-08-07 Thread Tim Peters
Change by Tim Peters : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed versions: +Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.or

[issue37787] Minimum denormal or ** bug

2019-08-07 Thread Tim Peters
Tim Peters added the comment: Since this depends on the platform libm implementation of pow(), I'm closing this as "won't fix". Steve, on the chance you're serious ;-) , there are implementations of the "constructive reals", which indeed act like infinit

[issue37807] Make hash() return a non-negative number

2019-08-10 Thread Tim Peters
Tim Peters added the comment: Well, I have no code that would benefit from this change. What's the point? Sure, I use _parts_ of hash codes at times, but, e.g., index = the_hash_code & SOME_LOW_BIT_MASK in Python couldn't care less about the sign of `the_hash_code`

[issue37807] Make hash() return a non-negative number

2019-08-11 Thread Tim Peters
Tim Peters added the comment: I agree: we "shouldn't have" documented anything about hash codes beyond the invariants needed to guarantee they work for their intended purpose, chiefly that x == y implies hash(x) == hash(y). Which leads to your other question ;-) That inva

[issue37831] bool(~True) == True

2019-08-12 Thread Tim Peters
Tim Peters added the comment: Mark, isn't `int()` the obvious way "to convert an integer-like thing to an actual int"? >>> int(True) 1 >>> int(False) 0 For the rest, I'm -True on making ~ do something magical for bools inconsistent with what

[issue37831] bool(~True) == True

2019-08-12 Thread Tim Peters
Tim Peters added the comment: I don't agree that "~" doesn't "work". If people are reading it as "not", they're in error. The Python docs say ~x means the bits of x inverted and that's what it does. There's no sense it whic

[issue37831] bool(~True) == True

2019-08-12 Thread Tim Peters
Tim Peters added the comment: BTW, I should clarify that I think the real "sin" here was making bool a subclass of int to begin with. For example, there's no sane reason at all for bools to support division, and no reason for a distinct type not to define "~&quo

[issue30671] dict: simplify and improve lookup function

2017-06-15 Thread Tim Peters
Tim Peters added the comment: Whatever the iteration, accept that it's necessary it reach every value in range(2**i) eventually. The current scheme does that, for reasons already documented. You seem to be overlooking the importance of this part: """ Note that because

[issue30671] dict: improve lookup function

2017-06-16 Thread Tim Peters
Tim Peters added the comment: Yes, any scheme whatsoever that guarantees to visit every int in range(2**i) meets the "correctness" part. But I concur with Inada: "head arguments" aren't compelling here, not even if I understood what they were saying ;-) If you imp

[issue30671] dict: improve lookup function

2017-06-17 Thread Tim Peters
Tim Peters added the comment: The attached hashsim.py pretty much convinces me there's nothing left to be gained here. It shows that the current scheme essentially achieves the performance of theoretical "uniform hashing" for string keys, for both successful and unsuccessf

[issue30671] dict: improve lookup function

2017-06-18 Thread Tim Peters
Tim Peters added the comment: I don't see a reason to keep this open, but I haven't been able to follow the OP's line of argument. My best _guess_ is that they're chasing illusions based on not understanding (or grossly undervaluing) that the primary point of the

[issue30671] dict: improve lookup function

2017-06-18 Thread Tim Peters
Tim Peters added the comment: Dmitry, I suggest you spend more of the time you give to thinking to writing code instead ;-) But, really, that's the easiest & surest way to discover for yourself where your analysis is going off in the weeds. For example, issue 28201 was both simpler

[issue30671] dict: improve lookup function

2017-06-19 Thread Tim Peters
Tim Peters added the comment: BTW, I should have spelled this out earlier: recall that x and y collide on the first probe if and only if x = y (mod 2**k) [1] The second probe never happens unless they do collide on the first probe, so when looking at the second probe we can assume

[issue30671] dict: improve lookup function

2017-06-24 Thread Tim Peters
Tim Peters added the comment: Actually, there is something to be gained here, for smaller tables. The simple formulas for the expected number of probes under uniform hashing are upper bounds, and are significantly overstated when the load factor is very high (not a concern for Python) or the

[issue29304] dict: simplify lookup functions

2017-06-25 Thread Tim Peters
Tim Peters added the comment: I suggest reading the thread I started here[1] before pursuing this: it looks very likely that the entire collision resolution scheme should be replaced with one of the "double hashing" ones given there, a bona fide algorithmic improvement for small

[issue29304] dict: simplify lookup functions

2017-06-25 Thread Tim Peters
Tim Peters added the comment: Oops! I undercounted the shifts in the current scheme: there's an additional shift for "perturb". That doesn't exist in the "double hashing" alternatives. -- ___ Python tracker <

[issue30880] PCG random number generator

2017-07-08 Thread Tim Peters
Tim Peters added the comment: I agree closing was appropriate at this time. I quite like PCG, but as Raymond said it's more a template for creating PRNGs than a specific generator. So even if a compelling case could be made, there's still a long way to having specific code in min

[issue30907] speed up comparisons to self for built-in containers

2017-07-12 Thread Tim Peters
Tim Peters added the comment: Just noting that every special code path "at the start" doesn't just speed the case it's aiming at, it also _slows_ every case that doesn't pass the new tests. It takes time to fail the new tests. So it usually makes things slower over

[issue30920] Sequence Matcher from diff lib is not implementing longest common substring problem correctly

2017-07-13 Thread Tim Peters
Tim Peters added the comment: This is an unfortunate consequence of the default "autojunk" feature. You can turn that off by passing `autojunk=False`, like so: match = SequenceMatcher(None, string1, string2, auto

[issue30907] speed up comparisons to self for built-in containers

2017-07-13 Thread Tim Peters
Tim Peters added the comment: Measuring in isolation (like with, e.g., timeit) isn't really interesting. Then everything is in L1 cache, branches are 100% predictable (because they take the same branch over & over for the duration of the test), and second-order effects on _other_

[issue30907] speed up comparisons to self for built-in containers

2017-07-14 Thread Tim Peters
Tim Peters added the comment: Ya, prior to now ;-) there's generally been some cost-benefit thought given to these things. Strings and ints are immutable and some values of each are uniquely interned, so the case of identity is more than just plausible. It happens often. I'd gues

[issue30965] Unexpected behavior of operator "in"

2017-07-18 Thread Tim Peters
Tim Peters added the comment: Not a bug. For an explanation, I just answered a very similar question on StackOverflow: https://stackoverflow.com/questions/45180899/unexpected-result-from-in-operator/45180967#45180899 -- nosy: +tim.peters

[issue30907] speed up comparisons to self for built-in containers

2017-07-21 Thread Tim Peters
Tim Peters added the comment: Victor, this part of the docs explains what you're seeing; scroll down to the """ In enforcing reflexivity of elements, the comparison of collections assumes that for a collection element x, x == x is always true ... """ part.

[issue31063] List Comprehension Bug

2017-07-27 Thread Tim Peters
Tim Peters added the comment: This isn't a bug. 84 appears twice in the list, the first time at index 9. The .index() method finds the first (leftmost; smallest index) occurrence. Since 9 isn't even, the `if` test isn't satisfied, so 84 does not appear in the result.

[issue31136] raw strings cannot end with a backslash character r'\'

2017-08-07 Thread Tim Peters
Tim Peters added the comment: Yes, I'm closing as not-a-bug. It's been this way (and documented) forever. More specifically, as the docs say, a raw string can't end with an _odd_ number of backslashes: """ String quotes can be escaped with a backslash, bu

[issue31186] Support heapfix() and heapremove() APIs in heapq module

2017-08-13 Thread Tim Peters
Tim Peters added the comment: @Rajath, I suspect Raymond may have been bamboozled because your suggested `heapfix()` and `heapremove()` didn't appear to take any arguments. If the index is a required argument, then these are O(log N) operations. I thought about adding them years ago

[issue14976] queue.Queue() is not reentrant, so signals and GC can cause deadlocks

2017-09-02 Thread Tim Peters
Tim Peters added the comment: [Guido] > Why was task management ever added? Raymond published a "joinable" queue class as a recipe here: http://code.activestate.com/recipes/475160-taskqueue/ and later folded it into the standard Python queue. So the usual answer applies: &q

[issue31517] MainThread association logic is fragile

2017-09-19 Thread Tim Peters
Tim Peters added the comment: Is there a problem here? I haven't heard of anyone even wondering about this before. threading.py worries about Threads created _by_ threading.py, plus the magical (from its point of view) thread that first imports threading.py. Users mix in `_thread` th

[issue31516] current_thread() becomes "dummy" thread during shutdown

2017-09-20 Thread Tim Peters
Tim Peters added the comment: Ya, it's clearly best for `current_thread()` to deliver consistent results. -- ___ Python tracker <https://bugs.python.org/is

[issue24746] doctest 'fancy diff' formats incorrectly strip trailing whitespace

2018-11-21 Thread Tim Peters
Tim Peters added the comment: To include trailing whitespace on a line in a doctest, _don't_ use raw strings. Use a regular string, and include add a (one or more) trailing \x20 instead of a space (for example). For example: r""" >>> print("a ") a &

[issue35369] List sorting makes duplicate comparisons

2018-11-30 Thread Tim Peters
Tim Peters added the comment: Yup, it can do some redundant comparisons; more on that here: https://mail.python.org/pipermail/python-dev/2018-August/155020.html I'm not inclined to make this already-difficult code even harder to understand for something that's quite likely not

<    1   2   3   4   5   6   7   8   9   10   >