Re: [Python-Dev] Python startup time

2017-07-23 Thread Michel Desmoulin


> Optimizing startup time is incredibly valuable, 

I've been reading that from the beginning of this thread but I've been
using python since the 2.4 and I never felt the burden of the startup time.

I'm guessing a lot of people are like me, they just don't express them
self because "better startup time can't be bad so let's not put a
barrier on this".

I'm not against it, but since the necessity of a faster Python in
general has been a debate for years and is only finally catching up with
the work of Victor Stinner, can somebody explain me the deal with start
up time ?

I understand where it can improve your lives. I just don't get why it's
suddenly such an explosion of expectations and needs.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Antoine Pitrou
On Sat, 22 Jul 2017 16:35:31 -0700
Steve Dower  wrote:
> 
> Yes, I’m aware of that, which is why I don’t have any specific suggestions 
> off-hand. But given the differences in file systems between Windows and other 
> OSs, it wouldn’t surprise me if there were a more optimal approach for NTFS 
> to amortize calls better. Perhaps not, but it is still the most expensive 
> part of startup that we have any ability to change, so it’s worth 
> investigating.

Can you expand on it being "the most expensive part of startup that we
have any ability to change"?

For example, how do Nick's benchmarks above fare on Windows?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] startup time repeated? why not daemon

2017-07-23 Thread Victor Stinner
We already did that. See _bootlocale for example. (Maybe also
_collecctions_abc?)

Victor

Le 22 juil. 2017 07:20, "Chris Jerdonek"  a
écrit :

> On Fri, Jul 21, 2017 at 9:52 AM, Brett Cannon  wrote:
> > On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek 
> > wrote:
> >> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan 
> wrote:
> >> > ...
> >> > * Lazy loading can have a significant impact on startup time, as it
> >> > means you don't have to pay for the cost of finding and loading
> >> > modules that you don't actually end up using on that particular run
> >
> > It should be mentioned that I have started designing an API to make using
> > lazy loading much easier in Python 3.7 (i.e. "calling a single function"
> > easier), but I still have to write the tests and such before I propose a
> > patch and it will still be mainly for apps that know what they are doing
> > since lazy loading makes debugging import errors harder.
> > ...
> >> > However, if we're going to recommend them as good practices for 3rd
> >> > party developers looking to optimise the startup time of their Python
> >> > applications, then it makes sense for us to embrace them for the
> >> > standard library as well, rather than having our first reaction be to
> >> > write more hand-crafted C code.
> >>
> >> Are there any good write-ups of best practices and techniques in this
> >> area for applications (other than obvious things like avoiding
> >> unnecessary imports)? I'm thinking of things like how to structure
> >> your project, things to look for, developer tools that might help, and
> >> perhaps third-party runtime libraries?
> >
> > Nothing beyond "profile your application" and "don't do stuff during
> import
> > as a side-effect" that I'm aware of.
>
> One "project structure" idea of the sort I had in mind is to move
> frequently used functions in a module into their own module. This way
> the most common paths of execution don't load unneeded functions.
> Following this line of reasoning could lead to grouping functions in
> an application by when they're needed instead of by what they do,
> which is different from what we normally see. I don't recall seeing
> advice like this anywhere, so maybe the trade-offs aren't worth it.
> Thoughts?
>
> --Chris
>
>
> >
> > -Brett
> >
> >>
> >>
> >> --Chris
> >>
> >>
> >>
> >> >
> >> > On that last point, it's also worth keeping in mind that we have a
> >> > much harder time finding new C-level contributors than we do new
> >> > Python-level ones, and have every reason to expect that problem to get
> >> > worse over time rather than better (since writing and maintaining
> >> > handcrafted C code is likely to go the way of writing and maintaining
> >> > handcrafted assembly code as a skillset: while it will still be
> >> > genuinely necessary in some contexts, it will also be an increasingly
> >> > niche technical specialty).
> >> >
> >> > Starting to migrate to using Cython for our acceleration modules
> >> > instead of plain C should thus prove to be a win for everyone:
> >> >
> >> > - Cython structurally avoids a lot of typical bugs that arise in
> >> > hand-coded extensions (e.g. refcount bugs)
> >> > - by design, it's much easier to mentally switch between Python &
> >> > Cython than it is between Python & C
> >> > - Cython accelerated modules are easier to adapt to other interpeter
> >> > implementations than handcrafted C modules
> >> > - keeping Python modules and their C accelerated counterparts in sync
> >> > will be easier, as they'll mostly be using the same code
> >> > - we'd be able to start writing C API test cases in Cython rather than
> >> > in handcrafted C (which currently mostly translates to only testing
> >> > them indirectly)
> >> > - CPython's own test suite would naturally help test Cython
> >> > compatibility with any C API updates
> >> > - we'd have an inherent incentive to help enhance Cython to take
> >> > advantage of new C API features
> >> >
> >> > The are some genuine downsides in increasing the complexity of
> >> > bootstrapping CPython when all you're starting with is a VCS clone and
> >> > a C compiler, but those complications are ultimately no worse than
> >> > those we already have with Argument Clinic, and hence amenable to the
> >> > same solution: if we need to, we can check in the generated C files in
> >> > order to make bootstrapping easier.
> >> >
> >> > Cheers,
> >> > Nick.
> >> >
> >> > --
> >> > Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> >> > ___
> >> > Python-Dev mailing list
> >> > Python-Dev@python.org
> >> > https://mail.python.org/mailman/listinfo/python-dev
> >> > Unsubscribe:
> >> > https://mail.python.org/mailman/options/python-dev/
> chris.jerdonek%40gmail.com
> >> ___
> >> Python-Dev mailing list
> >> Python-Dev@python.org
> >> https://mail.python.org/mailman/listinfo/python-dev
> >> Unsubscribe:
> >> https://mail.python.org/mailman/optio

Re: [Python-Dev] Python startup time

2017-07-23 Thread Brett Cannon
On Sun, Jul 23, 2017, 00:53 Michel Desmoulin, 
wrote:

>
>
> > Optimizing startup time is incredibly valuable,
>
> I've been reading that from the beginning of this thread but I've been
> using python since the 2.4 and I never felt the burden of the startup time.
>
> I'm guessing a lot of people are like me, they just don't express them
> self because "better startup time can't be bad so let's not put a
> barrier on this".
>
> I'm not against it, but since the necessity of a faster Python in
> general has been a debate for years and is only finally catching up with
> the work of Victor Stinner, can somebody explain me the deal with start
> up time ?
>
> I understand where it can improve your lives. I just don't get why it's
> suddenly such an explosion of expectations and needs.
>

It's actually always been something we have tried to improve, it just comes
in waves. For instance we occasionally re-examine what modules get pulled
in during startup. Importlib was optimized to help with startup. This just
happens to be the latest round of trying to improve the situation.

As for why we care, every command-line app wants to at least appear faster
if not be faster because just getting to the point of being able to e.g.
print a version number is dominated by Python and app start-up. And this is
not guessing; I work with a team that puts out a command line app and one
of the biggest complaints they get is the startup time.

-brett

___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Michel Desmoulin


Le 23/07/2017 à 19:36, Brett Cannon a écrit :
> 
> 
> On Sun, Jul 23, 2017, 00:53 Michel Desmoulin,  > wrote:
> 
> 
> 
> > Optimizing startup time is incredibly valuable,
> 
> I've been reading that from the beginning of this thread but I've been
> using python since the 2.4 and I never felt the burden of the
> startup time.
> 
> I'm guessing a lot of people are like me, they just don't express them
> self because "better startup time can't be bad so let's not put a
> barrier on this".
> 
> I'm not against it, but since the necessity of a faster Python in
> general has been a debate for years and is only finally catching up with
> the work of Victor Stinner, can somebody explain me the deal with start
> up time ?
> 
> I understand where it can improve your lives. I just don't get why it's
> suddenly such an explosion of expectations and needs.
> 
> 
> It's actually always been something we have tried to improve, it just
> comes in waves. For instance we occasionally re-examine what modules get
> pulled in during startup. Importlib was optimized to help with startup.
> This just happens to be the latest round of trying to improve the situation.
> 
> As for why we care, every command-line app wants to at least appear
> faster if not be faster because just getting to the point of being able
> to e.g. print a version number is dominated by Python and app start-up.


Fair enought.

> And this is not guessing; I work with a team that puts out a command
> line app and one of the biggest complaints they get is the startup time.

This I don't get. When I run any command line utility in python (grin,
ffind, pyped, django-admin.py...), the execute in a split second.

I can't even SEE the different between:

python3 -c "import os; [print(x) for x in os.listdir('.')]"

and

ls .

I'm having a hard time understanding how the Python VM startup time can
be perceived as a barriere here. I can understand if you have an
application firing Python 1000 times a second, like a CGI service or
some kind of code exec service. But scripting ?

Now I can imagine that a given Python program can be slow to start up,
because it imports a lot of things. But not the VM itself.


> 
> -brett
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
> 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Brett Cannon
On Sun, Jul 23, 2017, 10:52 Michel Desmoulin, 
wrote:

>
>
> Le 23/07/2017 à 19:36, Brett Cannon a écrit :
> >
> >
> > On Sun, Jul 23, 2017, 00:53 Michel Desmoulin,  > > wrote:
> >
> >
> >
> > > Optimizing startup time is incredibly valuable,
> >
> > I've been reading that from the beginning of this thread but I've
> been
> > using python since the 2.4 and I never felt the burden of the
> > startup time.
> >
> > I'm guessing a lot of people are like me, they just don't express
> them
> > self because "better startup time can't be bad so let's not put a
> > barrier on this".
> >
> > I'm not against it, but since the necessity of a faster Python in
> > general has been a debate for years and is only finally catching up
> with
> > the work of Victor Stinner, can somebody explain me the deal with
> start
> > up time ?
> >
> > I understand where it can improve your lives. I just don't get why
> it's
> > suddenly such an explosion of expectations and needs.
> >
> >
> > It's actually always been something we have tried to improve, it just
> > comes in waves. For instance we occasionally re-examine what modules get
> > pulled in during startup. Importlib was optimized to help with startup.
> > This just happens to be the latest round of trying to improve the
> situation.
> >
> > As for why we care, every command-line app wants to at least appear
> > faster if not be faster because just getting to the point of being able
> > to e.g. print a version number is dominated by Python and app start-up.
>
>
> Fair enought.
>
> > And this is not guessing; I work with a team that puts out a command
> > line app and one of the biggest complaints they get is the startup time.
>
> This I don't get. When I run any command line utility in python (grin,
> ffind, pyped, django-admin.py...), the execute in a split second.
>
> I can't even SEE the different between:
>
> python3 -c "import os; [print(x) for x in os.listdir('.')]"
>
> and
>
> ls .
>
> I'm having a hard time understanding how the Python VM startup time can
> be perceived as a barriere here. I can understand if you have an
> application firing Python 1000 times a second, like a CGI service or
> some kind of code exec service. But scripting ?
>

So you're viewing it from a single OS and single machine perspective. Stuff
varies so much that you can't compare something like this based on a single
experience.

I also said "appear" on purpose. 😉 Some people just compare Python against
other languages based on benchmarks like startup when choosing a language
so part of this is optics. This also applies when people compare Python 2
to 3.


> Now I can imagine that a given Python program can be slow to start up,
> because it imports a lot of things. But not the VM itself.
>

There's also the fact that some things we might do to speed up Python's own
startup will propagate to user code and so have a bigger effect, e.g.
making namedtuple cheaper reaches into user code that uses namedtuple.

IOW based on experience this is worth the time to look into.


>
> >
> > -brett
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org 
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> >
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
> >
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] startup time repeated? why not daemon

2017-07-23 Thread Chris Jerdonek
On Sun, Jul 23, 2017 at 5:57 AM, Victor Stinner
 wrote:
> We already did that. See _bootlocale for example. (Maybe also
> _collecctions_abc?)

I was asking more in the context of recommended practices for
third-party developers, as Nick mentioned earlier, because it's not a
strategy I've ever seen mentioned (and common practice is to group
only by functionality).

It's good to know re: locale and collections though. Incidentally,
from the issue thread it doesn't look like _bootlocale was motivated
primarily by startup time, but _collections_abc was:

locale: http://bugs.python.org/issue9548
collections.abc: http://bugs.python.org/issue19218

--Chris

>
> Victor
>
> Le 22 juil. 2017 07:20, "Chris Jerdonek"  a écrit
> :
>>
>> On Fri, Jul 21, 2017 at 9:52 AM, Brett Cannon  wrote:
>> > On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek 
>> > wrote:
>> >> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan 
>> >> wrote:
>> >> > ...
>> >> > * Lazy loading can have a significant impact on startup time, as it
>> >> > means you don't have to pay for the cost of finding and loading
>> >> > modules that you don't actually end up using on that particular run
>> >
>> > It should be mentioned that I have started designing an API to make
>> > using
>> > lazy loading much easier in Python 3.7 (i.e. "calling a single function"
>> > easier), but I still have to write the tests and such before I propose a
>> > patch and it will still be mainly for apps that know what they are doing
>> > since lazy loading makes debugging import errors harder.
>> > ...
>> >> > However, if we're going to recommend them as good practices for 3rd
>> >> > party developers looking to optimise the startup time of their Python
>> >> > applications, then it makes sense for us to embrace them for the
>> >> > standard library as well, rather than having our first reaction be to
>> >> > write more hand-crafted C code.
>> >>
>> >> Are there any good write-ups of best practices and techniques in this
>> >> area for applications (other than obvious things like avoiding
>> >> unnecessary imports)? I'm thinking of things like how to structure
>> >> your project, things to look for, developer tools that might help, and
>> >> perhaps third-party runtime libraries?
>> >
>> > Nothing beyond "profile your application" and "don't do stuff during
>> > import
>> > as a side-effect" that I'm aware of.
>>
>> One "project structure" idea of the sort I had in mind is to move
>> frequently used functions in a module into their own module. This way
>> the most common paths of execution don't load unneeded functions.
>> Following this line of reasoning could lead to grouping functions in
>> an application by when they're needed instead of by what they do,
>> which is different from what we normally see. I don't recall seeing
>> advice like this anywhere, so maybe the trade-offs aren't worth it.
>> Thoughts?
>>
>> --Chris
>>
>>
>> >
>> > -Brett
>> >
>> >>
>> >>
>> >> --Chris
>> >>
>> >>
>> >>
>> >> >
>> >> > On that last point, it's also worth keeping in mind that we have a
>> >> > much harder time finding new C-level contributors than we do new
>> >> > Python-level ones, and have every reason to expect that problem to
>> >> > get
>> >> > worse over time rather than better (since writing and maintaining
>> >> > handcrafted C code is likely to go the way of writing and maintaining
>> >> > handcrafted assembly code as a skillset: while it will still be
>> >> > genuinely necessary in some contexts, it will also be an increasingly
>> >> > niche technical specialty).
>> >> >
>> >> > Starting to migrate to using Cython for our acceleration modules
>> >> > instead of plain C should thus prove to be a win for everyone:
>> >> >
>> >> > - Cython structurally avoids a lot of typical bugs that arise in
>> >> > hand-coded extensions (e.g. refcount bugs)
>> >> > - by design, it's much easier to mentally switch between Python &
>> >> > Cython than it is between Python & C
>> >> > - Cython accelerated modules are easier to adapt to other interpeter
>> >> > implementations than handcrafted C modules
>> >> > - keeping Python modules and their C accelerated counterparts in sync
>> >> > will be easier, as they'll mostly be using the same code
>> >> > - we'd be able to start writing C API test cases in Cython rather
>> >> > than
>> >> > in handcrafted C (which currently mostly translates to only testing
>> >> > them indirectly)
>> >> > - CPython's own test suite would naturally help test Cython
>> >> > compatibility with any C API updates
>> >> > - we'd have an inherent incentive to help enhance Cython to take
>> >> > advantage of new C API features
>> >> >
>> >> > The are some genuine downsides in increasing the complexity of
>> >> > bootstrapping CPython when all you're starting with is a VCS clone
>> >> > and
>> >> > a C compiler, but those complications are ultimately no worse than
>> >> > those we already have with Argument Clinic, and hence amenable to the
>> >> > same solution: if we need to, we can 

Re: [Python-Dev] Python startup time

2017-07-23 Thread Nick Coghlan
On 23 July 2017 at 09:35, Steve Dower  wrote:
> Yes, I’m aware of that, which is why I don’t have any specific suggestions
> off-hand. But given the differences in file systems between Windows and
> other OSs, it wouldn’t surprise me if there were a more optimal approach for
> NTFS to amortize calls better. Perhaps not, but it is still the most
> expensive part of startup that we have any ability to change, so it’s worth
> investigating.

That does remind me of a capability we haven''t played with a lot recently:

$ python3 -m site
sys.path = [
'/home/ncoghlan',
'/usr/lib64/python36.zip',
'/usr/lib64/python3.6',
'/usr/lib64/python3.6/lib-dynload',
'/home/ncoghlan/.local/lib/python3.6/site-packages',
'/usr/lib64/python3.6/site-packages',
'/usr/lib/python3.6/site-packages',
]
USER_BASE: '/home/ncoghlan/.local' (exists)
USER_SITE: '/home/ncoghlan/.local/lib/python3.6/site-packages' (exists)
ENABLE_USER_SITE: True

The interpreter puts a zip file ahead of the regular unpacked standard
library on sys.path because at one point in time that was a useful
optimisation technique for reducing import costs on application
startup. It was a potentially big win with the old "multiple stat
calls" import implementation, but I'm not aware of any more recent
benchmarks relative to the current listdir-caching based import
implementation.

So I think some interesting experiments to try measuring might be:

- pushing the "always imported" modules into a dedicated zip archive
- having the interpreter pre-seed sys.modules with the contents of
that dedicated archive
- freezing those modules and building them into the interpreter that way
- compiling the standalone top-level modules with Cython, and loading
them as extension modules
- compiling in the Cython generated modules as builtins (not currently
an option for packages & submodules due to [1])

The nice thing about those kinds of approaches is that they're all
fairly general purpose, and relate primarily to how the Python
interpreter is put together, rather than how the individual modules
are written in the first place.

(I'm not volunteering to run those experiments, though - just pointing
out some of the technical options we have available to us that don't
involve adding more handcrafted C extension modules to CPython)

[1] https://bugs.python.org/issue1644818

Cheers,
NIck.

P.S. Checking the current list of source modules implicitly loaded at
startup, I get:

>>> import sys
>>> sorted(k for k, m in sys.modules.items() if m.__spec__ is not None and 
>>> type(m.__spec__.loader).__name__ == "SourceFileLoader")
['_collections_abc', '_sitebuiltins', '_weakrefset', 'abc', 'codecs',
'encodings', 'encodings.aliases', 'encodings.latin_1',
'encodings.utf_8', 'genericpath', 'io', 'os', 'os.path', 'posixpath',
'rlcompleter', 'site', 'stat']


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com