Tim Peters added the comment:
The attached fastsearch.diff removes the speed hit in the original test case
and in the constructed one.
I don't know whether it "should be" applied, and really can't make time to dig
into it.
The rationale: when the last characters of th
Tim Peters added the comment:
Ya, the case for the diff is at best marginal. Note that while it may be
theoretically provable that the extra test would make the worst cases slower,
that's almost certainly not measurable. The extra test would almost never be
executed in the worst case
Tim Peters added the comment:
Impressive, Dennis! Nice work.
FYI, on the OP's original test data, your fastsearch() completes each search in
under 20 seconds using CPython, and in well under 1 second using PyPy.
Unfortunately, that's so promising it can't just be dismis
Tim Peters added the comment:
> For a bunch of cases it's slower, for some others it's faster.
I have scant real idea what you're doing, but in the output you showed 4 output
lines are labelled "slower" but 18 are labelled "faster".
What you wrote just
Tim Peters added the comment:
Dennis, would it be possible to isolate some of the cases with more extreme
results and run them repeatedly under the same timing framework, as a test of
how trustworthy the _framework_ is? From decades of bitter experience, most
benchmarking efforts end up
Tim Peters added the comment:
Dennis, I'm delighted that the timing harness pointed out an actual glitch, and
that it was so (seemingly) straightforward to identify the algorithmic cause.
This gives me increased confidence that this project can be pushed to adoption,
and your name wi
Tim Peters added the comment:
> There's no discomfort at all to me if, e.g., it stored
> 32-bit counts and is indexed by the last 6 bits of the
> character. That's a measly 256 bytes in all.
Or, for the same space, 16-bit counts indexed by the last 7 bits. Then there'
Tim Peters added the comment:
For completeness, a link to the python-dev thread about this:
https://mail.python.org/archives/list/python-...@python.org/thread/ECAZN35JCEE67ZVYHALRXDP4FILGR53Y/#4IEDAS5QAHF53IV5G3MRWPQAYBIOCWJ5
--
___
Python tracker
Tim Peters added the comment:
I don't think Rabin-Karp is worth trying here. The PR is provably worst-case
linear time, and the constant factor is already reasonably small. Its only real
weakness I can see is that it can be significantly (but seemingly not
dramatically) slower tha
Tim Peters added the comment:
Removed 3.8 from the Versions list. The code was functioning as designed, and
the O(n*m) worst case bounds were always known to be possible, so there's no
actual bug here.
--
versions: -Python 3.8
___
Python tr
Tim Peters added the comment:
And changed the "Type" field to "performance", because speed is the only issue
here.
--
type: behavior -> performance
___
Python tracker
<https://
Tim Peters added the comment:
BTW, in the old post of Fredrick's I linked to, he referred to a
"stringbench.py" program he used for timings, but the link is dead.
I was surprised to find that it still lives on, under the Tools directory. Or
did - I'm on Windows now a
Tim Peters added the comment:
When I ran stringbench yesterday (or the day before - don't remember), almost
all the benefit seemed to come from the "late match, 100 characters" tests.
Seems similar for your run. Here are your results for just that batch,
interleaving the t
Tim Peters added the comment:
I don't think we have any idea how the OP stumbled into this. Looks like it
"just happened".
The case you construted is quadratic-time, but not quite as bad:
BaB
BB
Fails at once, because 'a' doesn't match the
Tim Peters added the comment:
Yup, they act essentially the same, but yours jumps into the quicksand earlier
;-)
I'm fond of this one:
"""
HUGE = 10**7
BIG = 10**6
bigxs = 'x' * BIG
haystack = 'x' * HUGE
needle = bigxs + 'y' + bigxs
"
Tim Peters added the comment:
Dennis, I think that's expected, right? Two-way on its own can exploit nothing
about individual characters - it only preprocesses the needle to break the
possibility for quadratic-time behavior due to periods in the needle.
It sounds like you switche
Tim Peters added the comment:
I confess I _assumed_ all along that you were generalizing the current code's
Sunday trick to 7-bit equivalence classes (up from 32 bits total) and 64K
possible shift counts (up from just 2 total possibilities: 1 or len(needle)+1).
The Sunday trick cou
Tim Peters added the comment:
Note that Sunday doesn't care (at all) where mismatches occur. The "natural"
way to add Sunday: follow pure C-P unless/until it finds a mismatching
position. Pure C-P then computes a specific shift. Nothing about that changes.
But something
Tim Peters added the comment:
I'm sorry I haven't been able to give more time to this. I love what's been
done, but am just overwhelmed :-(
The main thing here is to end quadratic-time disasters, without doing
significant damage in other cases. Toward that end it would be fin
Tim Peters added the comment:
But that also suggests a different approach: start with the current code, but
add introspection to switch to your enhancement of C&P if the current code is
drifting out of linear-time territory.
--
___
Python tra
Tim Peters added the comment:
+1. If you're feeling more ambitious, it would also be good to change build.bat
and rt.bat to use the same "which platform?" spellings and with the same
defaults.
--
nosy: +tim.peters
___
Python
Tim Peters added the comment:
There's no bug here.
"&" is the bitwise Boolean logical-and operator on integers. For example,
>>> 1 & 2
0
>>> 1 & 3
1
It binds more tightly than the "==" equality-testing operator. To get the
result you w
Tim Peters added the comment:
>From the docs, at:
http://docs.python.org/library/datetime.html#time-objects
"""
in Boolean contexts, a time object is considered to be true if and only if,
after converting it to minutes and subtracting utcoffset() (or 0 if that’s
None),
Tim Peters added the comment:
It is odd, but really no odder than "zero values" of other types evaluating to
false in Boolean contexts ;-) Closing as "invalid".
--
resolution: -> invalid
status: open -> closed
___
Tim Peters added the comment:
I believe the thrust of Mark's suggestion was that it would allow using `k =
(n-1).bit_length()` even when n == 1, without special-casing n == 1. But you'd
still be adding a new "subtract 1" operation, and would still change results in
so
Tim Peters added the comment:
As Steven said, there's an obvious indentation error in the file you actually
attached. So nobody can _guess_ what your problem is. Please upload a file
showing your actual problem.
If I increase the indentation of the `print` Steven identified to matc
Tim Peters added the comment:
Same kind of problem with the new upload, Bob.
Line 38 is:
print(Fore.GREEN,"Installed file ",e,Fore.WHITE)
indented 8 spaces. Lines 39 and 40 are empty. Then line 41 is:
if x>=len(files): # if x is more than or equal to the number of files in t
Change by Tim Peters :
--
keywords: +patch
pull_requests: +13516
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/13612
___
Python tracker
<https://bugs.python.org/issu
Tim Peters added the comment:
Created a PR and assigned myself to this bug.
--
assignee: -> tim.peters
___
Python tracker
<https://bugs.python.org/issu
Tim Peters added the comment:
Added file arena.py. This adds some code to the OP's original test, to print
out build time and teardown time, and display obmalloc stats.
You'll need at least 80GB of RAM to run it - I don't have that much. Building
the tree may take on the o
Tim Peters added the comment:
Thank you so much, Inada! That's very good to hear :-)
--
___
Python tracker
<https://bugs.python.org/issue37029>
___
___
Tim Peters added the comment:
New changeset 1c263e39c4ed28225a7dc8ca1f24953225ac48ca by Tim Peters in branch
'master':
bpo-37029: keep usable_arenas in sorted order without searching (#13612)
https://github.com/python/cpython/commit/1c263e39c4ed28225a7dc8ca1f2495
Change by Tim Peters :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Tim Peters added the comment:
I prefer that a negative int raise ValueError, but am OK with it using the
absolute value instead (i.e., what it does now).
--
___
Python tracker
<https://bugs.python.org/issue29
Tim Peters added the comment:
Strongly prefer requiring 0 <= k <= n at first. This is a programming
language: it will be applied to real problems, not to abstract proofs where
some slop can be helpful in reducing the number of cases that need to be
considered.
The Twitter fel
Tim Peters added the comment:
I'm not convinced, although I agree relaxing k <= n is less damaging than
relaxing k >= 0.
Python isn't aimed at mathematicians (although some 3rd-party packages
certainly are, and they're free to define things however they like).
Tim Peters added the comment:
I'm going to repeat part of an earlier comment :-)
"""
Please resist pointless feature creep. The original report was about comb(n,
k) for integer n and k with 0 <= k <= n and that's all. Everyone who commented
appeared t
Tim Peters added the comment:
I'm not fatally opposed to relaxing k <= n. David makes some good points about
it, and - as Raymond already noted - "0" is consistent with the behavior of
itertools.combinations().
The docs would need to change, though, because the factorial
Tim Peters added the comment:
Ya, I'm mostly with Raymond. `math` was originally a very thin wrapper around
C's libm, but we've been moving away from that more and more for decades, and
it's no longer the case anyway that the vast bulk of new Python programmers are
inti
Tim Peters added the comment:
Python needs a benevolent dictator ;-)
I'd be happy for Mark, Raymond, or me to play that role here. If it were me,
I'd drop the k <= n requirement: both arguments must be non-negative integers.
Because a combinatorial (not factorial-, and
Tim Peters added the comment:
I haven't looked at anything in this PR, so just popping in to confirm that the
first time I saw stuff like:
len(obj, /)
in the docs I had no idea at all what it was trying to say. I thought it was a
typo.
Also the second, third, fourth, ..., times
Tim Peters added the comment:
Thank you, Friedl! I appreciate the extra confirmation - and especially on
even bigger test cases :-)
--
___
Python tracker
<https://bugs.python.org/issue37
Tim Peters added the comment:
Also basic: run hardware CPU and memory stress diagnostics, and/or try running
the same thing on a different machine. Hardware isn't infallible, and can fail
in nearly arbitrary ways. For example, perhaps a smidgen of silicon has gone
flaky, so that one
Tim Peters added the comment:
I agree: perm(n) should return factorial(n).
--
___
Python tracker
<https://bugs.python.org/issue37178>
___
___
Python-bugs-list m
New submission from Tim Peters :
On 64-bit Python, many object sizes essentially doubled over 32-bit Python,
because Python objects are so heavy with pointers. More recently, forcing
alignment to 16 bytes on 64-bit boxes boosted the memory requirements more
modestly.
But obmalloc's 25
Change by Tim Peters :
--
keywords: +patch
pull_requests: +13801
stage: -> patch review
pull_request: https://github.com/python/cpython/pull/13934
___
Python tracker
<https://bugs.python.org/issu
Change by Tim Peters :
--
assignee: -> tim.peters
___
Python tracker
<https://bugs.python.org/issue37211>
___
___
Python-bugs-list mailing list
Unsubscrib
New submission from Tim Peters :
Scenario: all arenas are fully used. A program then runs a loop like:
while whatever:
p = malloc(n)
...
free(p)
At the top, a new arena has to be created, and a single object is taken out of
a single pool. At the bottom, that object is
Change by Tim Peters :
--
type: -> performance
___
Python tracker
<https://bugs.python.org/issue37257>
___
___
Python-bugs-list mailing list
Unsubscrib
Change by Tim Peters :
--
keywords: +patch
pull_requests: +13903
stage: test needed -> patch review
pull_request: https://github.com/python/cpython/pull/14039
___
Python tracker
<https://bugs.python.org/issu
Tim Peters added the comment:
New changeset d1c85a27ea9fe70163cad3443d5e534d94f08284 by Tim Peters in branch
'master':
bpo-37257: obmalloc: stop simple arena thrashing (#14039)
https://github.com/python/cpython/commit/d1c85a27ea9fe70163cad3443d5e53
Change by Tim Peters :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
___
Python tracker
<https://bugs.python.or
Tim Peters added the comment:
Looks likely that the _major_ cause of the quadratic-time delete behavior was
due to that obmalloc used a linear-time method to keep its linked list of
usable arenas sorted in order of number of free pools. When a pool became
unused, its arena's count of
Change by Tim Peters :
--
stage: resolved -> commit review
___
Python tracker
<https://bugs.python.org/issue32846>
___
___
Python-bugs-list mailing list
Un
Tim Peters added the comment:
Raymond, please read my very recent comment one above yours. A (overall)
quadratic-time algorithm (O(A**2) where A is the number of arenas) in
obmalloc.c is (to my eyes) probably the _primary_ cause of the sloth here.
That's been fixed for 3.8, but I
Tim Peters added the comment:
Thanks, Terry! Based on your latest results, "quadratic time" isn't plausible
here anymore, so I'm closing this. Nasty cache effects certainly played a
role, but they were just a flea on the dog ;-)
--
resolution: -> fix
Tim Peters added the comment:
In real life, I expect 99.999%+ of calls will be made with small arguments, so
(1) is worth it. I like Mark's suggestion to use uint64_t so the acceptable
range doesn't depend on platform. At least in the world I live in, 32-bit
boxes are all b
Change by Tim Peters :
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/issue37448>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
I haven't used protobuf, but it's _generally_ true that crashes that occur for
the first time in the presence of C or C++ extension modules are due to subtle
(or not so subtle) mistakes in using the sometimes-delicate Python C API. So
it's t
Tim Peters added the comment:
Mark's analysis is spot-on - good eye :-)
Here under 3.7.3 [MSC v.1916 64 bit (AMD64)] on win32, in the original script
it makes no difference at all for negative "small x" (where, as Mark said, `1 -
random.random()` is exactly representable):
Tim Peters added the comment:
I hate this change :-( The code generated for something like this today:
def f():
if 0:
x = 1
elif 0:
x = 2
elif 1:
x = 3
elif 0:
x = 4
else:
x = 5
print(x)
is the same as for:
def f():
x = 3
Tim Peters added the comment:
There's "correctness" that matters and "correctness" that's merely pedantic ;-)
CPython has acted the current way for about 15 years (since 2.4 was released),
and this is the first time anyone has raised an objection. That'
Tim Peters added the comment:
> This is the expected result of fixing a bug that has been
> open since 2008
It's the expected result of fixing a bug _by_ eliminating the optimization
entirely. It's not an expected result of merely fixing the bug. It's quite
obvious
Tim Peters added the comment:
> Using jumps is not removing the optimization
> entirely, is just a weaker and more incomplete
> way of doing the same.
Sorry, I'm afraid I have no idea what that means. The generated code before
and after was wildly different, as shown in
Tim Peters added the comment:
> we could say that it does not matter if
>
> def f():
> if 0:
> yield
>
> should be or not a generator
Slippery slope arguments play better if they're made _before_ a decade has
passed after the slope was fully greased.
There&
Change by Tim Peters :
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/issue37543>
___
___
Python-bugs-list mailing list
Unsubscribe:
Change by Tim Peters :
--
nosy: +tim.peters
___
Python tracker
<https://bugs.python.org/issue37537>
___
___
Python-bugs-list mailing list
Unsubscribe:
Tim Peters added the comment:
You'll see much the same in every programming language that supports your
computer's floating-point hardware. Start by reading this gentle introduction:
https://docs.python.org/3/tutorial/floatingpoint.html
This bug tracker isn't a place for tu
Tim Peters added the comment:
Python delegates exponentiation with a Python float result to the platform C's
double precision `pow()` function. So this is just what the Windows C pow(2.0,
-1075.0) returns. All native floating point operations are subject various
kinds of error, and
Change by Tim Peters :
--
resolution: -> fixed
stage: patch review -> resolved
status: open -> closed
versions: +Python 3.7, Python 3.8, Python 3.9
___
Python tracker
<https://bugs.python.or
Tim Peters added the comment:
Since this depends on the platform libm implementation of pow(), I'm closing
this as "won't fix".
Steve, on the chance you're serious ;-) , there are implementations of the
"constructive reals", which indeed act like infinit
Tim Peters added the comment:
Well, I have no code that would benefit from this change. What's the point?
Sure, I use _parts_ of hash codes at times, but, e.g.,
index = the_hash_code & SOME_LOW_BIT_MASK
in Python couldn't care less about the sign of `the_hash_code`
Tim Peters added the comment:
I agree: we "shouldn't have" documented anything about hash codes beyond the
invariants needed to guarantee they work for their intended purpose, chiefly
that x == y implies hash(x) == hash(y).
Which leads to your other question ;-) That inva
Tim Peters added the comment:
Mark, isn't `int()` the obvious way "to convert an integer-like thing to an
actual int"?
>>> int(True)
1
>>> int(False)
0
For the rest, I'm -True on making ~ do something magical for bools inconsistent
with what
Tim Peters added the comment:
I don't agree that "~" doesn't "work". If people are reading it as "not",
they're in error. The Python docs say ~x
means
the bits of x inverted
and that's what it does. There's no sense it whic
Tim Peters added the comment:
BTW, I should clarify that I think the real "sin" here was making bool a
subclass of int to begin with. For example, there's no sane reason at all for
bools to support division, and no reason for a distinct type not to define "~&quo
Tim Peters added the comment:
Whatever the iteration, accept that it's necessary it reach every value in
range(2**i) eventually. The current scheme does that, for reasons already
documented. You seem to be overlooking the importance of this part:
"""
Note that because
Tim Peters added the comment:
Yes, any scheme whatsoever that guarantees to visit every int in range(2**i)
meets the "correctness" part.
But I concur with Inada: "head arguments" aren't compelling here, not even if
I understood what they were saying ;-) If you imp
Tim Peters added the comment:
The attached hashsim.py pretty much convinces me there's nothing left to be
gained here. It shows that the current scheme essentially achieves the
performance of theoretical "uniform hashing" for string keys, for both
successful and unsuccessf
Tim Peters added the comment:
I don't see a reason to keep this open, but I haven't been able to follow the
OP's line of argument. My best _guess_ is that they're chasing illusions based
on not understanding (or grossly undervaluing) that the primary point of the
Tim Peters added the comment:
Dmitry, I suggest you spend more of the time you give to thinking to writing
code instead ;-) But, really, that's the easiest & surest way to discover for
yourself where your analysis is going off in the weeds.
For example, issue 28201 was both simpler
Tim Peters added the comment:
BTW, I should have spelled this out earlier: recall that x and y collide on
the first probe if and only if
x = y (mod 2**k) [1]
The second probe never happens unless they do collide on the first probe, so
when looking at the second probe we can assume
Tim Peters added the comment:
Actually, there is something to be gained here, for smaller tables. The simple
formulas for the expected number of probes under uniform hashing are upper
bounds, and are significantly overstated when the load factor is very high (not
a concern for Python) or the
Tim Peters added the comment:
I suggest reading the thread I started here[1] before pursuing this: it looks
very likely that the entire collision resolution scheme should be replaced with
one of the "double hashing" ones given there, a bona fide algorithmic
improvement for small
Tim Peters added the comment:
Oops! I undercounted the shifts in the current scheme: there's an additional
shift for "perturb". That doesn't exist in the "double hashing" alternatives.
--
___
Python tracker
<
Tim Peters added the comment:
I agree closing was appropriate at this time. I quite like PCG, but as Raymond
said it's more a template for creating PRNGs than a specific generator. So
even if a compelling case could be made, there's still a long way to having
specific code in min
Tim Peters added the comment:
Just noting that every special code path "at the start" doesn't just speed the
case it's aiming at, it also _slows_ every case that doesn't pass the new
tests. It takes time to fail the new tests. So it usually makes things slower
over
Tim Peters added the comment:
This is an unfortunate consequence of the default "autojunk" feature. You can
turn that off by passing `autojunk=False`, like so:
match = SequenceMatcher(None, string1, string2, auto
Tim Peters added the comment:
Measuring in isolation (like with, e.g., timeit) isn't really interesting.
Then everything is in L1 cache, branches are 100% predictable (because they
take the same branch over & over for the duration of the test), and
second-order effects on _other_
Tim Peters added the comment:
Ya, prior to now ;-) there's generally been some cost-benefit thought given to
these things. Strings and ints are immutable and some values of each are
uniquely interned, so the case of identity is more than just plausible. It
happens often.
I'd gues
Tim Peters added the comment:
Not a bug. For an explanation, I just answered a very similar question on
StackOverflow:
https://stackoverflow.com/questions/45180899/unexpected-result-from-in-operator/45180967#45180899
--
nosy: +tim.peters
Tim Peters added the comment:
Victor, this part of the docs explains what you're seeing; scroll down to the
"""
In enforcing reflexivity of elements, the comparison of collections assumes
that for a collection element x, x == x is always true ...
"""
part.
Tim Peters added the comment:
This isn't a bug. 84 appears twice in the list, the first time at index 9.
The .index() method finds the first (leftmost; smallest index) occurrence.
Since 9 isn't even, the `if` test isn't satisfied, so 84 does not appear in the
result.
Tim Peters added the comment:
Yes, I'm closing as not-a-bug. It's been this way (and documented) forever.
More specifically, as the docs say, a raw string can't end with an _odd_ number
of backslashes:
"""
String quotes can be escaped with a backslash, bu
Tim Peters added the comment:
@Rajath, I suspect Raymond may have been bamboozled because your suggested
`heapfix()` and `heapremove()` didn't appear to take any arguments. If the
index is a required argument, then these are O(log N) operations.
I thought about adding them years ago
Tim Peters added the comment:
[Guido]
> Why was task management ever added?
Raymond published a "joinable" queue class as a recipe here:
http://code.activestate.com/recipes/475160-taskqueue/
and later folded it into the standard Python queue. So the usual answer
applies: &q
Tim Peters added the comment:
Is there a problem here? I haven't heard of anyone even wondering about this
before. threading.py worries about Threads created _by_ threading.py, plus the
magical (from its point of view) thread that first imports threading.py. Users
mix in `_thread` th
Tim Peters added the comment:
Ya, it's clearly best for `current_thread()` to deliver consistent results.
--
___
Python tracker
<https://bugs.python.org/is
Tim Peters added the comment:
To include trailing whitespace on a line in a doctest, _don't_ use raw strings.
Use a regular string, and include add a (one or more) trailing \x20 instead of
a space (for example). For example:
r"""
>>> print("a ")
a
&
Tim Peters added the comment:
Yup, it can do some redundant comparisons; more on that here:
https://mail.python.org/pipermail/python-dev/2018-August/155020.html
I'm not inclined to make this already-difficult code even harder to understand
for something that's quite likely not
501 - 600 of 1332 matches
Mail list logo