Re: [Python-Dev] Type hints -- a mediocre programmer's reaction

2015-04-24 Thread Kevin Modzelewski
On Fri, Apr 24, 2015 at 6:05 PM, Ronan Lamy  wrote:

> Le 24/04/15 19:45, Paul Sokolovsky a écrit :
>
>> Hello,
>>
>> On Fri, 24 Apr 2015 18:27:29 +0100
>> Ronan Lamy  wrote:
>>
>>  PyPy's FAQ
> has an explanation of why type hints are not for performance.
>
> http://pypy.readthedocs.org/en/latest/faq.html#would-type-annotations-help-pypy-s-performance
>

 You probably intended to write "why type hints are not for *PyPy's*
 performance". There're many other language implementations and
 modules for which it may be useful, please don't limit your
 imagination by a single case.

>>>
>>> Those points apply to basically any compliant implementation of
>>> Python relying on speculative optimisation. Python is simply too
>>> dynamic for PEP484-style hints to provide any useful performance
>>> improvement targets.
>>>
>>
>> What's your point - saying that type annotations alone not enough to
>> achieve the best ("C-like") performance, which is true, or saying that
>> if they are alone not enough, then they are not needed at all, which
>> is ... strange ?
>>
>
> My point is that the arguments in the PyPy FAQ aren't actually specific to
> PyPy, and therefore that the conclusion, that hints are almost entirely
> useless if you’re looking at performance, holds in general.
> So let me restate these arguments in terms of a generic,
> performance-minded implementation of the full Python language spec:
>
> * Hints have no run-time effect. The interpreter cannot assume that they
> are obeyed.
> * PEP484 hints are too high-level. Replacing an 'int' object with a single
> machine word would be useful, but an 'int' annotation gives no guarantee
> that it's correct (because Python 3 ints can have arbitrary size and
> because subclasses of 'int' can override any operation to invoke arbitrary
> code).
> * A lot more information is needed to produce good code (e.g. “this f()
> called here really means this function there, and will never be
> monkey-patched” – same with len() or list(), btw).
> * Most of this information cannot easily be expressed as a type
> * If the interpreter gathers all that information, it'll probably have
> gathered a superset of what PEP484 can provide anyway.


I'm with the PyPy folks here -- I don't see any use for PEP 484 type hints
from a code generation perspective.  Even if the hints were guaranteed to
be correct, the PEP 484 type system doesn't follow substitutability.  I
don't mean that as a critique, I think it's a decision that makes it more
useful by keeping it in line with the majority of type usage in Python, but
it means that even if the hints are correct they don't really end up
providing any guarantees to the JIT.


>
>
>  And speaking of PyPy, it really should think how to improve its
 performance - not of generated programs, but of generation itself.
 If compilation of a trivial program on a pumpy hardware takes 5
 minutes and gigabytes of RAM and diskspace, few people will use it
 for other purposes beyond curiosity. There's something very
 un-Pythonic in waiting 5 mins just to run 10-line script. Type
 hints can help here too ;-) (by not wasting resources propagating
 types thru the same old standard library for example).

>>>
>>> Sorry, but that's nonsense. PyPy would be a seriously useless
>>> interpreter if running a 10-line script required such a lengthy
>>> compilation, so, obviously, that's not what happens.
>>>
>>> You seem to misunderstand what PyPy is: it's an interpreter with a
>>> just-in-time compiler, not a static compiler. It doesn't generate
>>> programs in any meaningful sense. Instead, it interprets the program,
>>> and when it detects a hot code path, it compiles it to machine code
>>> based on the precise types it sees. No resources are wasted on code
>>> that isn't actually executed.
>>>
>>
>> Regardless of whether I understood that meta-meta stuff, I just
>> followed couple of tutorials, each of them warning of memory and disk
>> space issues, and both running long to get results. Everyone else
>> following tutorials will get the same message I did - PyPy is a
>> slow-to-work-with bloat.
>>
>
> Ah, I suppose you're talking about the RPython tool chain, which is used
> to build PyPy. Though it's an interesting topic in itself (and is pretty
> much comparable to Cython wrt. type hints), it has about as much relevance
> to PyPy users as the inner workings of GCC have to CPython users.
>
> Well, the thing is that people don't seem to want to write PyPy tutorials,
> because it's boring. However, I can give you the definitive 3-line version:
> 1. Download and install PyPy [http://pypy.org/download.html]
> 2. Launch the 'pypy' executable.
> 3. Go read https://docs.python.org/2/tutorial/
>
>  As for uber-meta stuff PyPy offers - I'm glad that's all done in
>> my favorite language, leaving all other languages behind. I'm saddened
>> there's no mundane JIT or static compiler usable and accepted by all
>>

Re: [Python-Dev] Python-versus-CPython question for __mul__ dispatch

2015-05-19 Thread Kevin Modzelewski
We have a similar experience -- Pyston runs into a similar issue with
sqlalchemy (with "str() + foo" calling foo.__radd__ before str.sq_concat)
and we are working to match CPython's behavior.

On Tue, May 19, 2015 at 7:00 AM, Armin Rigo  wrote:

> Hi Nick,
>
> On 16 May 2015 at 10:31, Nick Coghlan  wrote:
> > Oh, that's rather annoying that the PyPy team implemented bug-for-bug
> > compatibility there, and didn't follow up on the operand precedence
> > bug report to say that they had done so.
>
> It's sadly not the only place, by far, where a behavior of CPython
> could be considered an implementation detail, but people rely on it
> and so we need to write a workaround.  We don't report all of them,
> particularly not the ones that are clearly of the kind "won't be
> changed in CPython 2.7".  Maybe we should?
>
> Another example where this same bug occurs is:
>
> class T(tuple):
>def __radd__(self, other):
>   return 42
>
> lst = [ ]
> lst += T()
>
> which calls T.__radd__ in contradiction to all the general rules.
> (Yes, if you print(lst) afterwards, you get 42.  And oops, trying this
> out on PyPy does not give 42; only "lst + T()" does.  Probably another
> corner case to fix...)
>
>
> A bientôt,
>
> Armin.
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/kevmod%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intricacies of calling __eq__

2014-03-18 Thread Kevin Modzelewski
My 2 cents: it feels like a slippery slope to start guaranteeing the number
and ordering of calls to comparison functions -- for instance, doing that
for the sort() function would lock in the sort implementation.  It feels
like the number/ordering of the calls should be "implementation-defined" in
the same way that dict iteration order is, or comparisons between
incomparable types; I don't think all of CPython's behavior should be
guaranteed as part of the language semantics, and that this kind of change
wouldn't have to necessarily represent "changing the semantics" if the
semantics weren't considered guaranteed in the first place.  On the other
hand, even if it's theoretically not guaranteed that the sort() function
calls __lt__specifically, in practice it'd be a bad idea for an alternative
implementation to call __gt__ instead, so I guess there's some aspect of
judgment as to what parts should be allowed to be CPython-defined (though
my personal take is that this doesn't apply to this particular case).


On Tue, Mar 18, 2014 at 7:21 AM, Steven D'Aprano wrote:

> On Tue, Mar 18, 2014 at 01:21:05PM +0200, Maciej Fijalkowski wrote:
>
> > note that this is specifically about dicts, where __eq__ will be
> > called undecided number of times anyway (depending on collisions in
> > hash/buckets which is implementation specific to start with)
>
> Exactly. Using a __eq__ method with side-effects is a good way to find
> out how many collisions your dict has :-)
>
> But specifically with your example,
>
> if x in d:
> return d[x]
>
> my sense of this is that it falls into the same conceptual area as the
> identity optimization for checking list or set containment: slightly
> unclean, but justified. Provided d is an actual built-in dict, and it
> hasn't been modified between one call and the next, I think it would be
> okay to optimize the second lookup d[x].
>
> A question: how far away will this optimization apply?
>
> if x in d:
> do_this()
> do_that()
> do_something_else()
> spam = d[x]
>
> Assuming no modifications to d, will the second lookup still be
> optimized?
>
>
>
> --
> Steven
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/kmod%40dropbox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intricacies of calling __eq__

2014-03-18 Thread Kevin Modzelewski
I think in this case, though, if we say for the sake of argument that the
guaranteed semantics of a dictionary lookup are zero or more calls to
__hash__ plus zero or more calls to __eq__, then two back-to-back
dictionary lookups wouldn't have any observable differences from doing only
one, unless you start to make assumptions about the behavior of the
implementation.  To me there seems to be a bit of a gap between seeing a
dictionary lookup and knowing the exact sequence of user-functions that get
called, far more than for example something like "a < b".  I would feel
differently if the question was if it's ok to fold something like

x = a < b
y = a < b

into a single comparison, since I'd agree with the way you described it
that you look at this code and would expect __lt__ to be called twice.  I
guess maybe I just disagree about whether dictionaries are contractually
bound to call __hash__ and __eq__?  For instance, maybe the dict could have
a small cache of recently-looked-up elements to skip hashing / equality
tests if they get accessed again; I have no idea if that would be
profitable or not, but it seems like that would be a roughly-equivalent
change that's still "doing two dictionary lookups" except the second one
simply wouldn't call back into the user's python code.  Or maybe the
dictionary could be implemented as a simple list for small sizes and skip
calling __hash__ until it decides to switch to a hash-table strategy; again
I'd still say it's "doing the lookups" but it just calls a different set of
python functions.




> Although I have tentatively said I think this is okay, it is a change in
> actual semantics of Python code: what you write is no longer what gets
> run. That makes this *very* different from changing the implementation
> of sort -- by analogy, its more like changing the semantics of
>
> a = f(x) + f(x)
>
> to only call f(x) once. I don't think you would call that an
> implementation detail, would you? Even if justified -- f(x) is a pure,
> deterministic function with no side-effects -- it would still be a
> change to the high-level behaviour of the code.
>

To me, the optimization stage of a compiler's job is to transform a program
to an equivalent one that runs faster, where equivalence is defined w.r.t.
a certain set of rules defining the behavior of the language.  If f() can
really be proven to be a function that is deterministic, has no
side-effects, does no runtime introspection, and returns a type that
supports the identity "x + x == 2 * x" (quite a bit of work for a dynamic
language jit, but definitely possible!), then I'd say that I have a fairly
different understanding of the "high-level behavior" the runtime is
contracted to follow.  As a simpler example, I think the runtime should be
very free to condense "a = 1 + 1" into "a = 2" without doing the addition.


Anyway, as I alluded to about the __lt__ / __gt__ usage in sort(), just
because I might want alternative implementations to have flexibility to do
something doesn't mean it's reasonable to say it's so.  I'm biased since
the implementation I'm working on uses std::unordered_map to implement
Python dictionaries, and I have no idea how that's actually implemented and
I'd rather not have to :)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Intricacies of calling __eq__

2014-03-19 Thread Kevin Modzelewski
Sorry, I definitely didn't mean to imply that this kind of optimization is
valid on arbitrary subscript expressions; I thought we had restricted
ourselves to talking about builtin dicts.  If we do, I think this becomes a
discussion about what subset of the semantics of CPython's builtins are
language-specified vs implementation-dependent; my argument is that just
because something results in an observable behavioral difference doesn't
necessarily mean that it's a change in language semantics, if it's just a
change in the implementation-dependent behavior.


On Tue, Mar 18, 2014 at 9:54 PM, Stephen J. Turnbull wrote:

> Kevin Modzelewski writes:
>
>  > I think in this case, though, if we say for the sake of argument
>  > that the guaranteed semantics of a dictionary lookup are zero or
>
> I don't understand the point of that argument.  It's simply false that
> semantics are guaranteed, and all of the dunders might be user
> functions.
>
>  > more calls to __hash__ plus zero or more calls to __eq__, then two
>  > back-to-back dictionary lookups wouldn't have any observable
>  > differences from doing only one, unless you start to make
>  > assumptions about the behavior of the implementation.
>
> That's false.  The inverse is true: you should allow the possibility of
> observable differences, unless you make assumptions about the behavior
> (implying there are none).
>
>  > To me there seems to be a bit of a gap between seeing a dictionary
>  > lookup and knowing the exact sequence of user-functions that get
>  > called, far more than for example something like "a < b".
>
> The point here is that we *know* that there may be a user function
> (the dunder that implements []) being called, and it is very hard to
> determine that that function is pure.
>
> Your example of a caching hash is exactly the kind of impure function
> that one would expect, but who knows what might be called -- there
> could be a reference to a database on Mars involved (do we have a
> vehicle on Mars at the moment? anyway...), which calls a pile of
> Twisted code, and has latencies of many seconds.
>
> So Steven is precisely right -- in order to allow this optimization,
> it would have to be explicitly allowed.
>
> Like Steven, I have no strong feeling against it, but then, I don't
> have a program talking to a deep space vehicle in my near future.
> Darn it! :-(
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Pyston: a Python JIT on LLVM

2014-04-03 Thread Kevin Modzelewski
Hi all,

I'm excited to announce Pyston, a Python JIT under development at Dropbox,
built on top of LLVM.  You can read more about it at the introductory blog
post,
or check out the code on github .

Since it's the question that I think most people will inevitably (and
rightly) ask, why do we think there's a place for Pyston when there's PyPy
and (previously) Unladen Swallow?

Compared to PyPy, Pyston makes a number of different technical choices,
such as using a method-at-a-time JIT vs PyPy's tracing JIT.  The
method-at-a-time approach seems to be being validated in the JavaScript
world, so there's reason to think there might be room to improve over PyPy,
though their extremely impressive performance makes that an admittedly
very, very high bar.  We also think that extension module support is a
first-class issue and have some ideas on how to try to make that fast.

As for Unladen Swallow, there are some reasons to think that LLVM has
matured greatly in the past few years, particularly in the JIT engine which
has been completely replaced.  I'm not sure if that's the only part to the
story; I'd be interested in talking with any of the people who were
involved or knowledgeable about the project.


It's definitely a very tall order to try to stand out among the existing
implementations, but in time we think we can do it.

I'll be at the language summit and PyCon sprints, and hope to see some of
you there!

kmod
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pyston: a Python JIT on LLVM

2014-04-04 Thread Kevin Modzelewski
Using optional type annotations is a really promising strategy and may
eventually be added to Pyston, but our primary target right now is
unmodified and untyped Python code.  I think there's room for both
approaches -- I think the "using type annotations to achieve near-native
performance" can be very useful ex. in a numerical computing context, but
might not apply as well to a "large web application" case.

On Thu, Apr 3, 2014 at 3:42 PM, Sturla Molden wrote:

> Kevin Modzelewski  wrote:
>
> > Since it's the question that I think most people will inevitably (and
> > rightly) ask, why do we think there's a place for Pyston when there's
> PyPy
> > and (previously) Unladen Swallow?
>
> Have you seen Numba, the Python JIT that integrates with NumPy?
>
> http://numba.pydata.org
>
> It uses LLVM to compile Python bytecode. When I have tried it I tend to get
> speed comparable to -O2 in C for numerical and algorithmic code.
>
> Here is an example, giving a 150 times speed boost to Python:
>
>
> http://stackoverflow.com/questions/21811381/how-to-shove-this-loop-into-numpy/21818591#21818591
>
>
> Sturla
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/kmod%40dropbox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pyston: a Python JIT on LLVM

2014-04-04 Thread Kevin Modzelewski
On Fri, Apr 4, 2014 at 1:59 AM, Antoine Pitrou  wrote:

>
> I'm a bit surprised by the approach. Why don't you simply process CPython
> bytecode, rather than strive to reimplement Python fully?
>

The original choice to operate on Python AST rather than bytecode was made
somewhat arbitrarily, but I think I still support that decision since it
avoids having to do a potentially performance-degrading translation between
a stack language and a register language.  It means we lose the ability to
execute pyc-only distributions, but I suppose that support for that could
be added if it becomes important.

As for why we're building our own runtime as part of all of this (which I
think is what you're getting at), I think that a lot of the performance of
an implementation is caught up in the runtime and isn't just about the AST-
or bytecode- execution.  There are a couple systems out there that will
compile Python to C modules, where all the behavior is implemented using
calls back through the C API.  I haven't tried them, but I'd suspect that
without type information, the gains from doing this aren't that great,
since while you can get rid of bytecode dispatch and perhaps get a better
view of control flow, it doesn't address anything about the dynamic nature
of Python.  For example, in Pyston the fast path for an instance attribute
lookup (with no __getattribute__) will be just two loads: one to lookup the
attribute array and one to load the appropriate offset.  I'd say that it's
necessary to have a different runtime to support that, because it has to be
cooperative and 1) use a different object representation everywhere and 2)
know how to backpatch attribute-lookups to fully take advantage of it.
 That said, we definitely try to not reimplement something if we don't need
to.


> Also, I wonder if it's worthwhile to use a conservative GC, rather than
> reuse the original refcounting scheme (especially since you want to support
> existing extension modules).


I wonder that too :)  The only way to know for sure will be to get it
working on real code, but I feel comfortable with the approach since I
trust that everyone else using GCs are happy and for a reason, and I think
it's possible any GC-related advantages can mask the related
extension-compatibility cost.

I was pretty happy when we switched from refcounting to a tracing GC; the
refcounting was already somewhat optimized (didn't emit any
obviously-duplicate increfs/decrefs), but removing the refcounting
operations opened up a number of other optimizations.  As a simple example,
when refcounting you can't typically do tail call elimination because you
have to do some decrefs() at the end of the function, and those decrefs
will also typically keep the variables live even if they didn't otherwise
need to be.  It was also very hard to tell that certain operations were
no-ops, since even if something is a no-op at the Python level, it can
still do a bunch of refcounting.  You can (and we almost did) write an
optimizer to try to match up all the incref's and decref's, but then you
have to start proving that certain variables remain the same after a
function call, etc  I'm sure it's possible, but using a GC instead made
all of these optimizations much more natural.



Pyston is definitely far on one side of the effort-vs-potential-payoff
spectrum, and it's certainly fair to say that there are other approaches
that would be less work to implement.  I think that with the wealth of very
impressive existing options, though, it makes sense to take the risky path
and to shoot very high, and I'm fortunate to be in a situation where we can
make a risky bet like that.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Re: Speeding up CPython

2020-10-20 Thread Kevin Modzelewski
I'd love to hear more about what workloads you're targeting and how you
came up with the anticipated numbers for the improvements.  For comparison,
our new jit provides a single-digit-percentage speedup on our django and
flask benchmarks.

On Tue, Oct 20, 2020 at 9:03 AM Mark Shannon  wrote:

> Hi everyone,
>
> CPython is slow. We all know that, yet little is done to fix it.
>
> I'd like to change that.
> I have a plan to speed up CPython by a factor of five over the next few
> years. But it needs funding.
>
> I am aware that there have been several promised speed ups in the past
> that have failed. You might wonder why this is different.
>
> Here are three reasons:
> 1. I already have working code for the first stage.
> 2. I'm not promising a silver bullet. I recognize that this is a
> substantial amount of work and needs funding.
> 3. I have extensive experience in VM implementation, not to mention a
> PhD in the subject.
>
> My ideas for possible funding, as well as the actual plan of
> development, can be found here:
>
> https://github.com/markshannon/faster-cpython
>
> I'd love to hear your thoughts on this.
>
> Cheers,
> Mark.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22T2EZDRCBM6ZYYIUTBWQVVVWH/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/QPQSR5UNTGLEL3GFQXQW4LA2OQX76YOG/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: "immortal" objects and how they would help per-interpreter GIL

2021-12-17 Thread Kevin Modzelewski
fwiw we added immortal objects to Pyston and haven't run into any issues
with it. The goal is a bit different: to eliminate common refcount
operations for performance, which we can do a bit more of because we have a
jit. And we don't mind if unaware code ends up changing the refcounts of
immortal objects since it's no worse for us than before.

So anyway maybe it's not super comparable for the issues discussed here,
but at least we haven't run into any issues of extension modules being
confused by very large reference counts. The one issue we do run into is
that quite a few projects will test in debug mode that their c extension
doesn't leak reference counts, and that no longer works for us because we
don't update Py_RefTotal for immortal objects.

kmod

On Wed, Dec 15, 2021 at 2:02 PM Eric Snow 
wrote:

> On Tue, Dec 14, 2021 at 11:19 AM Eric Snow 
> wrote:
> > There is one solution that would help both of the above in a nice way:
> > "immortal" objects.
>
> FYI, here are some observations that came up during some discussions
> with the "faster-cpython" team today:
>
> * immortal objects should probably only be immutable ones (other than
> ob_refcnt, of course)
> * GC concerns are less of an issue if a really high ref count (bit) is
> used to identify immortal objects
> * ob_refcnt is part of the public API (sadly), so using it to mark
> immortal objects may be sensitive to interference
> * ob_refcnt is part of the stable ABI (even more sadly), affecting any
> solution using ref counts
> * using the ref count isn't the only viable approach; another would be
> checking the pointer itself
>+ put the object in a specific section of static data and compare
> the pointer against the bounds
>+ this avoids loading the actual object data if it is immortal
>+ for objects that are mostly treated as markers (e.g. None), this
> could have a meaningful impact
>+ not compatible with dynamically allocated objects
>
> -eric
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/LVLFPOIOXM34NQ2G73BAXIRS4TIN74JV/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/RX7JKJTATP4IUIWB5SPUIARSYM2C56EQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 683: "Immortal Objects, Using a Fixed Refcount"

2022-02-16 Thread Kevin Modzelewski
fwiw Pyston has immortal objects, though with a slightly different goal and
thus design [1]. I'm not necessarily advocating for our design (it makes
most sense if there is a JIT involved), but just writing to report our
experience of making a change like this and the compatibility effects.

Importantly, our system allows for the reference count of immortal objects
to change, as long as it doesn't go below half of the original very-high
value. So extension code with no concept of immortality will still update
the reference counts of immortal objects, but this is fine. Because of this
we haven't seen any issues with extension modules.

The small amount of compatibility challenges we've run into have been in
testing code that checks for memory leaks. For example this code breaks on
Pyston:

def test():
  starting_refcount = sys.getrefcount(1)
  doABunchOfStuff()
  assert sys.getrefcount(1) == starting_refcount

This might work with this PEP, but we've also seen code that asserts that
the refcount increases by a specific value, which I believe wouldn't.

For Pyston we've simply disabled these tests, figuring that our users still
have CPython to test on. Personally I consider this breakage to be small,
but I hadn't seen anyone mention the potential usage of sys.getrefcount()
so I thought I'd bring it up.

- kmod

[1] Our goal is to entirely remove refcounting operations when we can prove
we are operating on an immortal object. We can prove it in a couple cases:
sometimes simply, such as in Py_RETURN_NONE, but mostly our JIT will often
know the immortality of objects it embeds into the code. So if we can prove
statically that an object is immortal then we elide the incref/decrefs, and
if we can't then we use an unmodified Py_INCREF/Py_DECREF. This means that
our reference counts on immortal objects will change, so we detect
immortality by checking if the reference count is at least half of the
original very-high value.

On Tue, Feb 15, 2022 at 7:13 PM Eric Snow 
wrote:

> Eddie and I would appreciate your feedback on this proposal to support
> treating some objects as "immortal".  The fundamental characteristic
> of the approach is that we would provide stronger guarantees about
> immutability for some objects.
>
> A few things to note:
>
> * this is essentially an internal-only change:  there are no
> user-facing changes (aside from affecting any 3rd party code that
> directly relies on specific refcounts)
> * the naive implementation shows a 4% slowdown
> * we have a number of strategies that should reduce that penalty
> * without immortal objects, the implementation for per-interpreter GIL
> will require a number of non-trivial workarounds
>
> That last one is particularly meaningful to me since it means we would
> definitely miss the 3.11 feature freeze.  With immortal objects, 3.11
> would still be in reach.
>
> -eric
>
> ---
>
> PEP: 683
> Title: Immortal Objects, Using a Fixed Refcount
> Author: Eric Snow , Eddie Elizondo
> 
> Discussions-To: python-dev@python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 10-Feb-2022
> Python-Version: 3.11
> Post-History:
> Resolution:
>
>
> Abstract
> 
>
> Under this proposal, any object may be marked as immortal.
> "Immortal" means the object will never be cleaned up (at least until
> runtime finalization).  Specifically, the `refcount`_ for an immortal
> object is set to a sentinel value, and that refcount is never changed
> by ``Py_INCREF()``, ``Py_DECREF()``, or ``Py_SET_REFCNT()``.
> For immortal containers, the ``PyGC_Head`` is never
> changed by the garbage collector.
>
> Avoiding changes to the refcount is an essential part of this
> proposal.  For what we call "immutable" objects, it makes them
> truly immutable.  As described further below, this allows us
> to avoid performance penalties in scenarios that
> would otherwise be prohibitive.
>
> This proposal is CPython-specific and, effectively, describes
> internal implementation details.
>
> .. _refcount:
> https://docs.python.org/3.11/c-api/intro.html#reference-counts
>
>
> Motivation
> ==
>
> Without immortal objects, all objects are effectively mutable.  That
> includes "immutable" objects like ``None`` and ``str`` instances.
> This is because every object's refcount is frequently modified
> as it is used during execution.  In addition, for containers
> the runtime may modify the object's ``PyGC_Head``.  These
> runtime-internal state currently prevent
> full immutability.
>
> This has a concrete impact on active projects in the Python community.
> Below we describe several ways in which refcount modification has
> a real negative effect on those projects.  None of that would
> happen for objects that are truly immutable.
>
> Reducing Cache Invalidation
> ---
>
> Every modification of a refcount causes the corresponding cache
> line to be invalidated.  This has a number of effects.
>
> For one, the write must be p

[Python-Dev] Contributing the Pyston jit?

2023-02-23 Thread Kevin Modzelewski
Hello all, we on the Pyston team would like to propose the contribution of
our JIT
 into
CPython main. We're interested in some initial feedback on this idea before
putting in the work to rebase the jit to 3.12 for a PEP and more formal
discussion.

Our jit is designed to be simple and to generate code quickly, so we
believe it's a good point on the design tradeoff curve for potential
inclusion. The runtime behavior is intentionally kept almost completely the
same as the interpreter, just lowered to machine code and with
optimizations applied.

Our jit currently targets Python 3.7-3.10, and on 3.8 it achieves a 10%
speedup on macrobenchmarks (similar to 3.11). It's hard to estimate the
potential speedup of our jit rebased onto 3.12 because there is overlap
between what our jit does and the optimizations that have gone into the
interpreter since 3.8, but there are several optimizations that would be
additive with the current performance work:
- Eliminating bytecode dispatch overhead
- Mostly-eliminating stack management overhead
- Reducing the number of reference count operations in the interpreter
- Faster function calls, particularly of C functions
- More specialization opportunities, both because a jit is not limited by
bytecode limits, but also because it is able to do dynamic specializations
that are not possible in an interpreter context

There is also room for more optimizations -- in Pyston we've co-optimized
the interpreter+jit combination such as by doing more extensive profiling
in the interpreter. Our plan would be to submit an initial version that
does not contain these optimizations in order to minimize the diff, and add
them later.

Our jit uses the DynASM assembler library (part of LuaJIT) to generate
machine code. Our jit currently supports Mac and Linux, 64-bit ARM and
x86_64. Now that we have two architectures supported, adding additional
ones is not too much work.

We think that our jit fits nicely in the technical roadmap of the Faster
CPython project, but conflicts with their plan to build a new custom
tracing jit.


As mentioned, we'd love to get feedback about the overall appetite for
including a jit in CPython!

kmod
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7HTSM36GBTP4H5HEUV4JMCDSVYVFFGGV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Contributing the Pyston jit?

2023-02-23 Thread Kevin Modzelewski
Ah ok thanks for the tip, I re-posted this as
https://discuss.python.org/t/contributing-the-pyston-jit/24195

On Thu, Feb 23, 2023 at 6:02 PM Brett Cannon  wrote:

> FYI you will probably get more engagement if you posted this to
> discuss.python.org .
>
> On Thu, Feb 23, 2023, 10:18 Kevin Modzelewski  wrote:
>
>> Hello all, we on the Pyston team would like to propose the contribution
>> of our JIT
>> <https://github.com/pyston/pyston/blob/pyston_main/Python/aot_ceval_jit.c> 
>> into
>> CPython main. We're interested in some initial feedback on this idea before
>> putting in the work to rebase the jit to 3.12 for a PEP and more formal
>> discussion.
>>
>> Our jit is designed to be simple and to generate code quickly, so we
>> believe it's a good point on the design tradeoff curve for potential
>> inclusion. The runtime behavior is intentionally kept almost completely the
>> same as the interpreter, just lowered to machine code and with
>> optimizations applied.
>>
>> Our jit currently targets Python 3.7-3.10, and on 3.8 it achieves a 10%
>> speedup on macrobenchmarks (similar to 3.11). It's hard to estimate the
>> potential speedup of our jit rebased onto 3.12 because there is overlap
>> between what our jit does and the optimizations that have gone into the
>> interpreter since 3.8, but there are several optimizations that would be
>> additive with the current performance work:
>> - Eliminating bytecode dispatch overhead
>> - Mostly-eliminating stack management overhead
>> - Reducing the number of reference count operations in the interpreter
>> - Faster function calls, particularly of C functions
>> - More specialization opportunities, both because a jit is not limited by
>> bytecode limits, but also because it is able to do dynamic specializations
>> that are not possible in an interpreter context
>>
>> There is also room for more optimizations -- in Pyston we've co-optimized
>> the interpreter+jit combination such as by doing more extensive profiling
>> in the interpreter. Our plan would be to submit an initial version that
>> does not contain these optimizations in order to minimize the diff, and add
>> them later.
>>
>> Our jit uses the DynASM assembler library (part of LuaJIT) to generate
>> machine code. Our jit currently supports Mac and Linux, 64-bit ARM and
>> x86_64. Now that we have two architectures supported, adding additional
>> ones is not too much work.
>>
>> We think that our jit fits nicely in the technical roadmap of the Faster
>> CPython project, but conflicts with their plan to build a new custom
>> tracing jit.
>>
>>
>> As mentioned, we'd love to get feedback about the overall appetite for
>> including a jit in CPython!
>>
>> kmod
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/7HTSM36GBTP4H5HEUV4JMCDSVYVFFGGV/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/TUTSBGG7D7HW6MFHVX46IQDWAF3MJJLS/
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread Kevin Modzelewski via Python-Dev
Hi all, I wrote a blog post about this.
http://blog.kevmod.com/2016/06/benchmarking-minimum-vs-average/

We can rule out any argument that one (minimum or average) is strictly
better than the other, since there are cases that make either one better.
It comes down to our expectation of the underlying distribution.

Victor if you could calculate the sample skewness
 of your results I
think that would be very interesting!

kmod

On Fri, Jun 10, 2016 at 10:04 AM, Steven D'Aprano 
wrote:

> On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote:
> > I started to work on visualisation. IMHO it helps to understand the
> problem.
> >
> > Let's create a large dataset: 500 samples (100 processes x 5 samples):
> > ---
> > $ python3 telco.py --json-file=telco.json -p 100 -n 5
> > ---
> >
> > Attached plot.py script creates an histogram:
> > ---
> > avg: 26.7 ms +- 0.2 ms; min = 26.2 ms
> >
> > 26.1 ms:   1 #
> > 26.2 ms:  12 #
> > 26.3 ms:  34 
> > 26.4 ms:  44 
> > 26.5 ms: 109 ##
> > 26.6 ms: 117 
> > 26.7 ms:  86 ##
> > 26.8 ms:  50 ##
> > 26.9 ms:  32 ###
> > 27.0 ms:  10 
> > 27.1 ms:   3 ##
> > 27.2 ms:   1 #
> > 27.3 ms:   1 #
> >
> > minimum 26.1 ms: 0.2% (1) of 500 samples
> > ---
> [...]
> > The distribution looks a gaussian curve:
> > https://en.wikipedia.org/wiki/Gaussian_function
>
> Lots of distributions look a bit Gaussian, but they can be skewed, or
> truncated, or both. E.g. the average life-span of a lightbulb is
> approximately Gaussian with a central peak at some value (let's say 5000
> hours), but while it is conceivable that you might be really lucky and
> find a bulb that lasts 15000 hours, it isn't possible to find one that
> lasts -1 hours. The distribution is truncated on the left.
>
> To me, your graph looks like the distribution is skewed: the right-hand
> tail (shown at the bottom) is longer than the left-hand tail, six
> buckets compared to five buckets. There are actual statistical tests for
> detecting deviation from Gaussian curves, but I'd have to look them up.
> But as a really quick and dirty test, we can count the number of samples
> on either side of the central peak (the mode):
>
> left: 109+44+34+12+1 = 200
> centre: 117
> right: 500 - 200 - 117 = 183
>
> It certainly looks *close* to Gaussian, but with the crude tests we are
> using, we can't be sure. If you took more and more samples, I would
> expect that the right-hand tail would get longer and longer, but the
> left-hand tail would not.
>
>
> > The interesting thing is that only 1 sample on 500 are in the minimum
> > bucket (26.1 ms). If you say that the performance is 26.1 ms, only
> > 0.2% of your users will be able to reproduce this timing.
>
> Hmmm. Okay, that is a good point. In this case, you're not so much
> reporting your estimate of what the "true speed" of the code snippet
> would be in the absence of all noise, but your estimate of what your
> users should expect to experience "most of the time".
>
> Assuming they have exactly the same hardware, operating system, and load
> on their system as you have.
>
>
> > The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms ..
> > 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us
> > 394/500 = 79%.
> >
> > IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than
> > 26.1 ms (0.2%).
>
> I think I understand the point you are making. I'll have to think about
> it some more to decide if I agree with you.
>
> But either way, I think the work you have done on perf is fantastic and
> I think this will be a great tool. I really love the histogram. Can you
> draw a histogram of two functions side-by-side, for comparisons?
>
>
> --
> Steve
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/kmod%40dropbox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] Daily reference leaks (b78574cb00ab): sum=1120

2016-11-19 Thread Kevin Modzelewski via Python-Dev
Hi Yury, you may be interested in some leak-finding code that wrote for
Pyston.  It uses the GC infrastructure to show you objects that were
directly leaked, ignoring indirect leaks -- ie objects that were only
leaked because they were referenced by a leaked object.  It can often give
you a very small list of objects to look into (depending on how many non-gc
objects were leaked).  If you're interested I can try porting it to CPython.

https://github.com/dropbox/pyston/blob/master/from_cpython/Modules/gcmodule.c#L894

kmod

On Wed, Nov 9, 2016 at 7:16 AM, Yury Selivanov 
wrote:

> I'm trying to fix refleaks in 3.6.  So far:
>
> On 2016-11-09 4:02 AM, solip...@pitrou.net wrote:
>
> results for b78574cb00ab on branch "default"
>> 
>>
>> test_ast leaked [98, 98, 98] references, sum=294
>> test_ast leaked [98, 98, 98] memory blocks, sum=294
>> test_asyncio leaked [3, 0, 0] memory blocks, sum=3
>> test_code leaked [2, 2, 2] references, sum=6
>> test_code leaked [2, 2, 2] memory blocks, sum=6
>> test_functools leaked [0, 3, 1] memory blocks, sum=4
>> test_pydoc leaked [106, 106, 106] references, sum=318
>> test_pydoc leaked [42, 42, 42] memory blocks, sum=126
>> test_trace leaked [12, 12, 12] references, sum=36
>> test_trace leaked [11, 11, 11] memory blocks, sum=33
>>
>>
>>
> test_ast, test_code and test_trace were fixed by
> https://hg.python.org/cpython/rev/2c6825c9ecfd
>
> test_pydoc leaks in test_typing_pydoc. I tried git bisect and it looks
> like that the first commit that introduced the refleak was the one that
> added test_typing_pydoc!
>
> 62127e60e7b0 doesn't modify any CPython internals, so it looks like that
> test_typing_pydoc exposed some bug that has existed before it. Any help
> tracking that down is welcome :)
>
> Yury
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/kmod%
> 40dropbox.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com