Re: [Python-Dev] Python startup time

2017-07-21 Thread David Mertz
How implausible is it to write out the actual memory image of a loaded
Python process? I.e. on a specific machine, OS, Python version, etc? This
can only be overhead initially, of course, but on subsequent runs it's just
one memory map, which the cheapest possible operation.

E.g.

$ python3.7 --write-image "import typing, re, os, numpy"

I imagine this creating a file like:

/tmp/__python__/python37-typing-re-os-numpy.mem

Then just terminating as if just that line had run, however long it takes
(but snapshotting before exit).

Then subsequent invocations would only restore the image to memory. Maybe:

$ pyrunner --load-image python37-typing-re-os-numpy myscript.py

The last line could be aliased of course. I suppose we'd need to check if
relevant file exists, and if not fall back to just ignoring the
'--load-image' flag and running plain old Python.

This helps not at all for something like AWS Lambda where each instance is
spun up fresh. But for the use-case of running many Python shell commands
at an interactive shell on one machine, it seems like that could be very
fast.

In my hypothetical I suppose pre-loading some collection of modules in the
image. Of course, the script may need to load others, and it may not use
some in the image. But users could decide their typical needed modules
themselves under this idea.

On Jul 20, 2017 11:27 PM, "Nick Coghlan"  wrote:

> On 21 July 2017 at 15:30, Cesare Di Mauro 
> wrote:
>
>>
>>
>> 2017-07-21 4:52 GMT+02:00 Nick Coghlan :
>>
>>> On 21 July 2017 at 12:44, Nick Coghlan  wrote:
>>> > We can separately measure the cost of unmarshalling the code object:
>>> >
>>> > $ python3 -m perf timeit -s "import typing; from marshal import loads;
>>> from
>>> > importlib.util import cache_from_source; cache =
>>> > cache_from_source(typing.__file__); data = open(cache,
>>> 'rb').read()[12:]"
>>> > "loads(data)"
>>> > .
>>> > Mean +- std dev: 286 us +- 4 us
>>>
>>> Slight adjustment here, as the cost of locating the cached bytecode
>>> and reading it from disk should really be accounted for in each
>>> iteration:
>>>
>>> $ python3 -m perf timeit -s "import typing; from marshal import loads;
>>> from importlib.util import cache_from_source" "cache =
>>> cache_from_source(typing.__spec__.origin); data = open(cache,
>>> 'rb').read()[12:]; loads(data)"
>>> .
>>> Mean +- std dev: 337 us +- 8 us
>>>
>>> That will have a bigger impact when loading from spinning disk or a
>>> network drive, but it's fairly negligible when loading from a local
>>> SSD or an already primed filesystem cache.
>>>
>>> Cheers,
>>> Nick.
>>>
>>> --
>>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>>>
>> Thanks for your tests, Nick. It's quite evident that the marshal code
>> cannot improve the situation, so I regret from my proposal.
>>
>
> It was still a good suggestion, since it made me realise I *hadn't*
> actually measured the relative timings lately, so it was technically an
> untested assumption that module level code execution still dominated the
> overall import time.
>
> typing is also a particularly large & complex module, and bytecode
> unmarshalling represents a larger fraction of the import time for simpler
> modules like abc:
>
> $ python3 -m perf timeit -s "import abc; from marshal import loads; from
> importlib.util import cache_from_source" "cache =
> cache_from_source(abc.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .
> Mean +- std dev: 45.2 us +- 1.1 us
>
> $ python3 -m perf timeit -s "import abc; loader_exec =
> abc.__spec__.loader.exec_module" "loader_exec(abc)"
> .
> Mean +- std dev: 172 us +- 5 us
>
> $ python3 -m perf timeit -s "import abc; from importlib import reload"
> "reload(abc)"
> .
> Mean +- std dev: 280 us +- 14 us
>
> And _weakrefset:
>
> $ python3 -m perf timeit -s "import _weakrefset; from marshal import
> loads; from importlib.util import cache_from_source" "cache =
> cache_from_source(_weakrefset.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .
> Mean +- std dev: 57.7 us +- 1.3 us
>
> $ python3 -m perf timeit -s "import _weakrefset; loader_exec =
> _weakrefset.__spec__.loader.exec_module" "loader_exec(_weakrefset)"
> .
> Mean +- std dev: 129 us +- 6 us
>
> $ python3 -m perf timeit -s "import _weakrefset; from importlib import
> reload" "reload(_weakrefset)"
> .
> Mean +- std dev: 226 us +- 4 us
>
> The conclusion still holds (the absolute numbers here are likely still too
> small for the extra complexity of parallelising bytecode loading to pay off
> in any significant way), but it also helps us set reasonable expectations
> around how much of a gain we're likely to be able to get just from
> precompilation with Cython.
>
> That does actually raise a small microbenchmarking problem: for source and
> bytecode imports, we can force the impo

Re: [Python-Dev] Python startup time

2017-07-21 Thread Antoine Pitrou
On Fri, 21 Jul 2017 00:12:20 -0700
David Mertz  wrote:
> How implausible is it to write out the actual memory image of a loaded
> Python process? I.e. on a specific machine, OS, Python version, etc? This
> can only be overhead initially, of course, but on subsequent runs it's just
> one memory map, which the cheapest possible operation.

You can't rely on the file being remapped at the same address when you
reload it.  So you'd have to write a relocation routine that's able to
find and fix *all* pointers inside the Python object tree and CPython's
internal structures (fixing the pointers is not necessarily difficult,
finding them without missing any is the difficult part).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread INADA Naoki
On Fri, Jul 21, 2017 at 4:12 PM, David Mertz  wrote:
> How implausible is it to write out the actual memory image of a loaded
> Python process? I.e. on a specific machine, OS, Python version, etc? This
> can only be overhead initially, of course, but on subsequent runs it's just
> one memory map, which the cheapest possible operation.

FYI, you may be interested in very recent node.js security issue.
https://nodejs.org/en/blog/vulnerability/july-2017-security-releases/#node-js-specific-security-flaws
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Need help to fix urllib(.parse) vulnerabilities

2017-07-21 Thread Victor Stinner
Hi,

Recently, two security vulnerabilities were reported in the urllib module:

https://bugs.python.org/issue30500
http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html#bpo-30500-urllib-connects-to-a-wrong-host
=> already fixed in Python 3.6.2

https://bugs.python.org/issue29606
http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection
=> not fixed yet

I also proposed a more general protection: "Reject newline character
(U+000A) in URLs in urllib.parse":
http://bugs.python.org/issue30713

The problem with the urllib module is how we handle invalid URL. Right
now, we return the URL unmodified if we cannot parse it. Should we
raise an exception if an URL contains a newline for example?

It's very hard to harden the urllib module without the backward
compatibility. That's why it took 3 weeks to fix "urllib connects to a
wrong host": find how to fix the vulnerability without brekaing the
backward compatibility.

Another proposed approach is to reject invalid data earlier or later,
but not in urllib...

So if you understand URLs, HTTP, etc. : please join these issues to
help us to fix them!

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities

2017-07-21 Thread Victor Stinner
2017-07-21 12:02 GMT+02:00 Victor Stinner :
> https://bugs.python.org/issue29606
> http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection
> => not fixed yet

Ok, I more concrete problem. To fix the "urllib FTP" bug, we have to
find a balance between security (reject any URL looking like an
attempt to counter the security protections) and backward
compatibility (accept filenames containing newlines).

Maybe we need to only reject an URL which contains a newline in the
"host" part, but accept them in the "path" part of the URL? The
question is if the code splits correctly "host" and "path" parts when
the URL contains a newline. My bet is that no, it behaves badly :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Nikolaus Rath
On Jul 21 2017, David Mertz  wrote:
> How implausible is it to write out the actual memory image of a loaded
> Python process?

That is what Emacs does, and it causes them a lot of trouble. They're
trying to move away from it at the moment, but the direction is not yet
clear. The keyword is "unexec", and it wrecks havoc with malloc.

Best,
-Nikolaus
-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities

2017-07-21 Thread Giampaolo Rodola'
On Fri, Jul 21, 2017 at 12:45 PM, Victor Stinner 
wrote:

> 2017-07-21 12:02 GMT+02:00 Victor Stinner :
> > https://bugs.python.org/issue29606
> > http://python-security.readthedocs.io/vuln/urllib_
> ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection
> > => not fixed yet
>
> Ok, I more concrete problem. To fix the "urllib FTP" bug, we have to
> find a balance between security (reject any URL looking like an
> attempt to counter the security protections) and backward
> compatibility (accept filenames containing newlines).
>
> Maybe we need to only reject an URL which contains a newline in the
> "host" part, but accept them in the "path" part of the URL? The
> question is if the code splits correctly "host" and "path" parts when
> the URL contains a newline. My bet is that no, it behaves badly :-)
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/g.
> rodola%40gmail.com
>

It took me a while to understand the security implications of this
FTP-related bug, but I believe I got the gist of it here (I can elaborate
further if it's not clear):
https://github.com/python/cpython/pull/1214#issuecomment-298393169
My proposal is to fix ftplib.py and guard against malicious strings
involving the *PORT command only*. This way we fix the issue *and* maintain
backward compatibility by allowing users to specify "\n" in their paths and
username / password pairs. Java took a different approach and disallowed
"\n" completely.
To my understanding fixing ftplib would automatically mean fixing urllib as
well.

-- 
Giampaolo - http://grodola.blogspot.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities

2017-07-21 Thread Random832
On Fri, Jul 21, 2017, at 08:43, Giampaolo Rodola' wrote:
> It took me a while to understand the security implications of this
> FTP-related bug, but I believe I got the gist of it here (I can
> elaborate further if it's not clear):
> https://github.com/python/cpython/pull/1214#issuecomment-298393169
> My proposal is to fix ftplib.py and guard against malicious
> strings involving the *PORT command only*. This way we fix the
> issue *and* maintain backward compatibility by allowing users to
> specify "\n" in their paths and username / password pairs. Java
> took a different approach and disallowed "\n" completely. To my
> understanding fixing ftplib would automatically mean fixing urllib
> as well. 

What would a \n in a path mean? What commands would you send over FTP to
successfully retrieve a file (or enter a username or password)
containing a newline in the name? In other words, what exactly are we
being backward compatible *with*?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2017-07-21 Thread Python tracker

ACTIVITY SUMMARY (2017-07-14 - 2017-07-21)
Python tracker at http://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open6058 (+16)
  closed 36679 (+38)
  total  42737 (+54)

Open issues with patches: 2343 


Issues opened (37)
==

#19896: Exposing "q" and "Q" to multiprocessing.sharedctypes
http://bugs.python.org/issue19896  reopened by haypo

#30450: Pull Windows dependencies from GitHub rather than svn.python.o
http://bugs.python.org/issue30450  reopened by steve.dower

#30931: Race condition in asyncore may access the wrong dispatcher
http://bugs.python.org/issue30931  opened by walkhour

#30934: Document how to run coverage for repository idlelib files.
http://bugs.python.org/issue30934  opened by terry.reedy

#30935: document the new behavior of get_event_loop() in Python 3.6
http://bugs.python.org/issue30935  opened by chris.jerdonek

#30937: csv module examples miss newline='' when opening files
http://bugs.python.org/issue30937  opened by Pavel

#30938: pdb lacks debugger command to list and show all user-defined v
http://bugs.python.org/issue30938  opened by David Rieger

#30939: Sphinx 1.6.3 deprecation warning for sphinx.util.compat.Direct
http://bugs.python.org/issue30939  opened by ned.deily

#30940: Documentation for round() is incorrect.
http://bugs.python.org/issue30940  opened by George K

#30944: Python 32 bit install fails on Windows - BitDefender false pos
http://bugs.python.org/issue30944  opened by Arie van Wingerden

#30945: loop.create_server does not detect if the interface is IPv6 en
http://bugs.python.org/issue30945  opened by cecton

#30947: Update embeded copy of libexpat to 2.2.2
http://bugs.python.org/issue30947  opened by haypo

#30949: Provide assertion functions in unittest.mock
http://bugs.python.org/issue30949  opened by odd_bloke

#30950: Convert round() to Arument Clinic
http://bugs.python.org/issue30950  opened by serhiy.storchaka

#30951: Documentation error in inspect module
http://bugs.python.org/issue30951  opened by jalexvig

#30952: include Math extension in SQlite
http://bugs.python.org/issue30952  opened by Big Stone

#30953: Fatal python error when jumping into except clause
http://bugs.python.org/issue30953  opened by ppperry

#30956: ftplib behaves oddly if socket timeout is greater than the def
http://bugs.python.org/issue30956  opened by arloclarke

#30959: Constructor signature is duplicated in the help of namedtuples
http://bugs.python.org/issue30959  opened by serhiy.storchaka

#30962: Add caching to logging.Logger.isEnabledFor()
http://bugs.python.org/issue30962  opened by aviso

#30963: xxlimited.c XxoObject_Check should be XxoObject_CheckExact
http://bugs.python.org/issue30963  opened by Jim.Jewett

#30964: Mention ensurepip in package installation docs
http://bugs.python.org/issue30964  opened by ncoghlan

#30966: multiprocessing.queues.SimpleQueue leaks 2 fds
http://bugs.python.org/issue30966  opened by arigo

#30967: Crash in PyThread_ReInitTLS() in the child process after os.fo
http://bugs.python.org/issue30967  opened by Thomas Mortensson

#30969: Docs should say that `x is z or x == z` is used for `x in y` i
http://bugs.python.org/issue30969  opened by ztane

#30971: Improve code readability of json.tool
http://bugs.python.org/issue30971  opened by dhimmel

#30972: Event loop incorrectly inherited in child processes.
http://bugs.python.org/issue30972  opened by Elvis.Pranskevichus

#30974: Update os.samefile docstring to match documentation
http://bugs.python.org/issue30974  opened by eMPee584

#30975: multiprocessing.Condition.notify_all() blocks indefinitely if 
http://bugs.python.org/issue30975  opened by mickp

#30977: reduce uuid.UUID() memory footprint
http://bugs.python.org/issue30977  opened by wbolster

#30978: str.format_map() silences exceptions in __getitem__
http://bugs.python.org/issue30978  opened by Akuli

#30979: the winapi fails to run shortcuts (because considers a shortcu
http://bugs.python.org/issue30979  opened by Bernát Gábor

#30980: Calling asyncore.file_wrapper.close twice may close unrelated 
http://bugs.python.org/issue30980  opened by Nir Soffer

#30981: IDLE: Add config dialog font page tests
http://bugs.python.org/issue30981  opened by terry.reedy

#30982: AMD64 Windows8.1 Refleaks 3.x: compilation error, cannot open 
http://bugs.python.org/issue30982  opened by haypo

#30983: eval frame rename in pep 0523 broke gdb's python extension
http://bugs.python.org/issue30983  opened by bcap

#30984: traceback.print_exc return value documentation
http://bugs.python.org/issue30984  opened by Jelle Zijlstra



Most recent 15 issues with no replies (15)
==

#30984: traceback.print_exc return value documentation
http://bugs.python.org/issue30984

#30983: eval frame rename in pep 0523 broke gdb's python extension
http://bugs.python.org/issue30983

#30980: Calli

Re: [Python-Dev] startup time repeated? why not daemon

2017-07-21 Thread Brett Cannon
On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek 
wrote:

> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan  wrote:
> > ...
> > * Lazy loading can have a significant impact on startup time, as it
> > means you don't have to pay for the cost of finding and loading
> > modules that you don't actually end up using on that particular run
>

It should be mentioned that I have started designing an API to make using
lazy loading much easier in Python 3.7 (i.e. "calling a single function"
easier), but I still have to write the tests and such before I propose a
patch and it will still be mainly for apps that know what they are doing
since lazy loading makes debugging import errors harder.


> >
> > We've historically resisted adopting these techniques for the standard
> > library because they *do* make things more complicated *and* harder to
> > debug relative to plain old eagerly imported dynamic Python code.
> > However, if we're going to recommend them as good practices for 3rd
> > party developers looking to optimise the startup time of their Python
> > applications, then it makes sense for us to embrace them for the
> > standard library as well, rather than having our first reaction be to
> > write more hand-crafted C code.
>
> Are there any good write-ups of best practices and techniques in this
> area for applications (other than obvious things like avoiding
> unnecessary imports)? I'm thinking of things like how to structure
> your project, things to look for, developer tools that might help, and
> perhaps third-party runtime libraries?
>

Nothing beyond "profile your application" and "don't do stuff during import
as a side-effect" that I'm aware of.

-Brett


>
> --Chris
>
>
>
> >
> > On that last point, it's also worth keeping in mind that we have a
> > much harder time finding new C-level contributors than we do new
> > Python-level ones, and have every reason to expect that problem to get
> > worse over time rather than better (since writing and maintaining
> > handcrafted C code is likely to go the way of writing and maintaining
> > handcrafted assembly code as a skillset: while it will still be
> > genuinely necessary in some contexts, it will also be an increasingly
> > niche technical specialty).
> >
> > Starting to migrate to using Cython for our acceleration modules
> > instead of plain C should thus prove to be a win for everyone:
> >
> > - Cython structurally avoids a lot of typical bugs that arise in
> > hand-coded extensions (e.g. refcount bugs)
> > - by design, it's much easier to mentally switch between Python &
> > Cython than it is between Python & C
> > - Cython accelerated modules are easier to adapt to other interpeter
> > implementations than handcrafted C modules
> > - keeping Python modules and their C accelerated counterparts in sync
> > will be easier, as they'll mostly be using the same code
> > - we'd be able to start writing C API test cases in Cython rather than
> > in handcrafted C (which currently mostly translates to only testing
> > them indirectly)
> > - CPython's own test suite would naturally help test Cython
> > compatibility with any C API updates
> > - we'd have an inherent incentive to help enhance Cython to take
> > advantage of new C API features
> >
> > The are some genuine downsides in increasing the complexity of
> > bootstrapping CPython when all you're starting with is a VCS clone and
> > a C compiler, but those complications are ultimately no worse than
> > those we already have with Argument Clinic, and hence amenable to the
> > same solution: if we need to, we can check in the generated C files in
> > order to make bootstrapping easier.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities

2017-07-21 Thread Raymond Hettinger

> On Jul 21, 2017, at 3:45 AM, Victor Stinner  wrote:
> 
> Ok, I more concrete problem. To fix the "urllib FTP" bug, we have to
> find a balance between security (reject any URL looking like an
> attempt to counter the security protections) and backward
> compatibility (accept filenames containing newlines).

For this case, the balance should probably tilt more towards security than 
backwards compatibility.   I would be very concerned about such odd URLs.  

That said, if backwards compatibility is going to be broken, consider giving 
users a temporary, clean way to opt-out of the additional projections (don't 
want to leave them high and dry if they happen to have a legitimate use case).


Raymond
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Cython compiled stdlib modules - Re: Python startup time

2017-07-21 Thread Stefan Behnel
Nick Coghlan schrieb am 21.07.2017 um 08:23:
> I'll also note that in these cases where the import overhead is
> proportionally significant for always-imported modules, we may want to look
> at the benefits of freezing them (if they otherwise remain as pure Python
> modules), or compiling them as builtin modules (if we switch them over to
> Cython), in addition to looking at ways to make the modules themselves
> faster.

Just for the sake of it, I gave the Cython compilation a try. I had to
apply the attached hack to Lib/typing.py to get the test passing, because
it uses frame call offsets in some places and Cython functions do not
create frames when being called (they only create them for exception
traces). I also had to disable the import of "abc" in the Cython generated
module to remove the circular self dependency at startup when the "abc"
module is compiled. That shouldn't have an impact on the runtime
performance, though.

Note that this is otherwise using the unmodified Python code, as provided
in the current modules, constructing and using normal Python classes for
everything, no extension types etc. Only two stdlib Python modules were
compiled into shared libraries, and not statically linked into the CPython
core.


I used the "python_startup" benchmark in the "performance" project to
measure the overall startup times of a clean non-debug non-pgo build of
CPython 3.7 (rev d0969d6) against the same build with a compiled typing.py
and abc.py. To compile these modules, I used the following command (plus
the attached patch)

$ cythonize -3 -X binding=True -i Lib/typing.py Lib/abc.py

I modified the startup benchmark to run "python -c 'import typing'" etc.
instead of just executing "pass".


- stock CPython starting up and running "pass":
Mean +- std dev: 14.7 ms +- 0.3 ms


- stock CPython starting up and running "import abc":
Mean +- std dev: 14.8 ms +- 0.3 ms

- with compiled abc.py:
Mean +- std dev: 14.9 ms +- 0.3 ms


- stock CPython starting up and running "import typing":
Mean +- std dev: 34.6 ms +- 1.0 ms

- with compiled abc.py
Mean +- std dev: 34.4 ms +- 0.6 ms

- with compiled typing.py:
Mean +- std dev: 33.5 ms +- 0.7 ms

- with both compiled:
Mean +- std dev: 33.1 ms +- 0.4 ms

That's only a 4% improvement in the overall startup time on my machine, and
about a 7% faster overall runtime of "import typing" compared to "pass".
Note also that compiling abc.py leads to a slightly *increased* startup
time in the "import abc" case, which might be due to the larger file size
of the abc.so file compared to the abc.pyc file. This is amortised by the
decreased runtime in the "import typing" case (I guess).


I then ran the test suites for both modules in lack of a better
post-startup runtime benchmark. The improvement for abc.py is in the order
of 1-2%, but test_typing.py has many more tests and wins about 13% overall:

- stock CPython executing essentially "runner.run(deepcopy(suite))" in
"test_typing.py" (the deepcopy() takes about 6 ms):
Mean +- std dev: 68.6 ms +- 0.8 ms

- compiled abc.py and typing.py:
Mean +- std dev: 60.7 ms +- 0.7 ms


One more thing to note: the compiled modules are quite large. I get these
file sizes:

   8658  Lib/abc.py
   7525  Lib/__pycache__/abc.cpython-37.pyc
 369930  Lib/abc.c
 122048  Lib/abc.cpython-37m-x86_64-linux-gnu.so

  80290  Lib/typing.py
  73921  Lib/__pycache__/typing.cpython-37.pyc
2951893  Lib/typing.c
1182632  Lib/typing.cpython-37m-x86_64-linux-gnu.so

The .so files are about 16x as large as the .pyc files. The typing.so file
weighs in with about 40% of the size of the stripped python binary:

2889136  python


As it stands, the gain is probably not worth the increase in library file
size, which also translates to a higher bottom line for the memory
consumption. At least not for these two modules. Manually optimising the
files would likely also reduce the .so file size in addition to giving
better speedups, though, because the generated code would become less generic.

Stefan
diff --git a/Lib/typing.py b/Lib/typing.py
index c487afc..19a73c3 100644
--- a/Lib/typing.py
+++ b/Lib/typing.py
@@ -19,6 +19,13 @@ except ImportError:
 MethodWrapperType = type(object().__str__)
 MethodDescriptorType = type(str.join)
 
+try:
+import cython
+except ImportError:
+_FRAME_OFFSET = 0
+else:
+_FRAME_OFFSET = -1 if cython.compiled else 0
+
 
 # Please keep __all__ alphabetized within each category.
 __all__ = [
@@ -1165,7 +1172,7 @@ class GenericMeta(TypingMeta, abc.ABCMeta):
 
 def __subclasscheck__(self, cls):
 if self.__origin__ is not None:
-if sys._getframe(1).f_globals['__name__'] not in ['abc', 'functools']:
+if sys._getframe(1 + _FRAME_OFFSET).f_globals['__name__'] not in ['abc', 'functools']:
 raise TypeError("Parameterized generics cannot be used with class "
 "or instance checks")
 return False
@@ -2124,7 +2131,7 @@ def _make_nmtuple(name, typ

[Python-Dev] Appending a link back to bugs.python.org in GitHub PRs

2017-07-21 Thread Brett Cannon
Thanks to Kushal Das we now have one of the most requested features since
the transition: a link in PRs back to bugs.python.org (in a more
discoverable way since we have had them since Bedevere launched :) . When a
pull request comes in with an issue number in the title (or one gets
added), a link to bugs.python.org will be appended to the PR's body (the
message you fill out when creating a PR). There's no logic to remove the
link if the issue number is removed from the title, changed, or for
multiple issue numbers since basically those cases are all rare and it was
easier to launch without that kind of support.

P.S.: Berker Peksag is working on providing commit emails with diffs in
them which is the other most requested feature since the transition.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Barry Warsaw
On Jul 21, 2017, at 01:25 PM, Nikolaus Rath wrote:

>That is what Emacs does, and it causes them a lot of trouble. They're
>trying to move away from it at the moment, but the direction is not yet
>clear. The keyword is "unexec", and it wrecks havoc with malloc.

Emacs has been unexec'ing for as long as I can remember (which is longer than
I can remember Python :).  I know that it's been problematic and there have
been many efforts over the years to replace it, but I think it's been a fairly
successful technique in practice, at least on platforms that support it.
That's another problem with the approach of course; it's not universally
possible to implement.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] startup time repeated? why not daemon

2017-07-21 Thread Barry Warsaw
On Jul 20, 2017, at 03:16 PM, Eric Snow wrote:

>Relatedly, at PyCon this year Barry and I were talking about the idea of
>bootstrapping the interpreter from a memory snapshot on disk, rather than
>from scatch (thus drastically reducing the number of IO events).

The TPI (Terrible Python Idea) I had at Pycon was some kind of (local)
memcached of imported Python modules, which would theoretically allow avoiding
loading the modules from the file system on start up.

There would be all kinds of problems with this (i.e. putting the "terrible" in
TPI), such as having to deal with module import side-effects, but perhaps
those could be handled by enough APIs and engineering.

Cheers,
-Barry


pgpwLxdNAr93a.pgp
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Skip Montanaro
Emacs has been unexec'ing for as long as I can remember (which is longer
than
I can remember Python :).  I know that it's been problematic and there have
been many efforts over the years to replace it, but I think it's been a
fairly
successful technique in practice, at least on platforms that support it.


I've been using Emacs far longer than Python. I remember having to invoke
temacs on something. Still, if I didn't know better, I could be convinced
you were referring to the GIL. :-)

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread David Mertz
I would guess that Windows users don't tend to run lots of command line
tools where startup time dominates, as *nix users do.

On Fri, Jul 21, 2017 at 3:21 PM, Barry Warsaw  wrote:

> On Jul 21, 2017, at 01:25 PM, Nikolaus Rath wrote:
>
> >That is what Emacs does, and it causes them a lot of trouble. They're
> >trying to move away from it at the moment, but the direction is not yet
> >clear. The keyword is "unexec", and it wrecks havoc with malloc.
>
> Emacs has been unexec'ing for as long as I can remember (which is longer
> than
> I can remember Python :).  I know that it's been problematic and there have
> been many efforts over the years to replace it, but I think it's been a
> fairly
> successful technique in practice, at least on platforms that support it.
> That's another problem with the approach of course; it's not universally
> possible to implement.
>
> -Barry
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> mertz%40gnosis.cx
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] startup time repeated? why not daemon

2017-07-21 Thread Larry Hastings


On 07/21/2017 03:28 PM, Barry Warsaw wrote:

The TPI (Terrible Python Idea) I had at Pycon was some kind of (local)
memcached of imported Python modules, which would theoretically allow avoiding
loading the modules from the file system on start up.

There would be all kinds of problems with this (i.e. putting the "terrible" in
TPI), such as having to deal with module import side-effects, but perhaps
those could be handled by enough APIs and engineering.


This would be taking a page out of PHP's book.

PHP--or at least, PHP ten years ago--doesn't have the equivalent of .pyc 
files.  If you have mod_php running inside Apache with no other 
extensions, it literally tokenizes each .php script every time it's invoked.


To solve this performance problem, someone wrote the "Alternative PHP 
Cache" (or "APC"), which runs /in Apache/.  (Yep, it's not usable 
outside Apache!)  APC stores the tokenized versions of PHP scripts in 
something approximating their actual in-memory representation.  To use 
something stored in the cache, you'd iterate over all the 
variables/functions, copy each one into your local interpreter instance, 
and perform fixups on all the pointers to convert them from relative to 
absolute addresses.


   http://php.net/manual/en/book.apc.php


I note that the introduction to APC says:

   Warning
   This extension is considered unmaintained and dead. However, the
   source code for this extension is still available within PECL.

So perhaps the PHP folks have moved on from this technique.


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] startup time repeated? why not daemon

2017-07-21 Thread Chris Jerdonek
On Fri, Jul 21, 2017 at 9:52 AM, Brett Cannon  wrote:
> On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek 
> wrote:
>> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan  wrote:
>> > ...
>> > * Lazy loading can have a significant impact on startup time, as it
>> > means you don't have to pay for the cost of finding and loading
>> > modules that you don't actually end up using on that particular run
>
> It should be mentioned that I have started designing an API to make using
> lazy loading much easier in Python 3.7 (i.e. "calling a single function"
> easier), but I still have to write the tests and such before I propose a
> patch and it will still be mainly for apps that know what they are doing
> since lazy loading makes debugging import errors harder.
> ...
>> > However, if we're going to recommend them as good practices for 3rd
>> > party developers looking to optimise the startup time of their Python
>> > applications, then it makes sense for us to embrace them for the
>> > standard library as well, rather than having our first reaction be to
>> > write more hand-crafted C code.
>>
>> Are there any good write-ups of best practices and techniques in this
>> area for applications (other than obvious things like avoiding
>> unnecessary imports)? I'm thinking of things like how to structure
>> your project, things to look for, developer tools that might help, and
>> perhaps third-party runtime libraries?
>
> Nothing beyond "profile your application" and "don't do stuff during import
> as a side-effect" that I'm aware of.

One "project structure" idea of the sort I had in mind is to move
frequently used functions in a module into their own module. This way
the most common paths of execution don't load unneeded functions.
Following this line of reasoning could lead to grouping functions in
an application by when they're needed instead of by what they do,
which is different from what we normally see. I don't recall seeing
advice like this anywhere, so maybe the trade-offs aren't worth it.
Thoughts?

--Chris


>
> -Brett
>
>>
>>
>> --Chris
>>
>>
>>
>> >
>> > On that last point, it's also worth keeping in mind that we have a
>> > much harder time finding new C-level contributors than we do new
>> > Python-level ones, and have every reason to expect that problem to get
>> > worse over time rather than better (since writing and maintaining
>> > handcrafted C code is likely to go the way of writing and maintaining
>> > handcrafted assembly code as a skillset: while it will still be
>> > genuinely necessary in some contexts, it will also be an increasingly
>> > niche technical specialty).
>> >
>> > Starting to migrate to using Cython for our acceleration modules
>> > instead of plain C should thus prove to be a win for everyone:
>> >
>> > - Cython structurally avoids a lot of typical bugs that arise in
>> > hand-coded extensions (e.g. refcount bugs)
>> > - by design, it's much easier to mentally switch between Python &
>> > Cython than it is between Python & C
>> > - Cython accelerated modules are easier to adapt to other interpeter
>> > implementations than handcrafted C modules
>> > - keeping Python modules and their C accelerated counterparts in sync
>> > will be easier, as they'll mostly be using the same code
>> > - we'd be able to start writing C API test cases in Cython rather than
>> > in handcrafted C (which currently mostly translates to only testing
>> > them indirectly)
>> > - CPython's own test suite would naturally help test Cython
>> > compatibility with any C API updates
>> > - we'd have an inherent incentive to help enhance Cython to take
>> > advantage of new C API features
>> >
>> > The are some genuine downsides in increasing the complexity of
>> > bootstrapping CPython when all you're starting with is a VCS clone and
>> > a C compiler, but those complications are ultimately no worse than
>> > those we already have with Argument Clinic, and hence amenable to the
>> > same solution: if we need to, we can check in the generated C files in
>> > order to make bootstrapping easier.
>> >
>> > Cheers,
>> > Nick.
>> >
>> > --
>> > Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>> > ___
>> > Python-Dev mailing list
>> > Python-Dev@python.org
>> > https://mail.python.org/mailman/listinfo/python-dev
>> > Unsubscribe:
>> > https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/brett%40python.org

On Fri, Jul 21, 2017 at 9:52 AM, Brett Cannon  wrote:
>
>
> On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek 
> wrote:
>>
>> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan  wrote:
>> > ...
>> > * Lazy loading can have a significant impact on startup time, as it
>> > means you don't have to pay for the cost of finding and load

Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities

2017-07-21 Thread Serhiy Storchaka

21.07.17 13:02, Victor Stinner пише:

Recently, two security vulnerabilities were reported in the urllib module:

https://bugs.python.org/issue30500
http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html#bpo-30500-urllib-connects-to-a-wrong-host
=> already fixed in Python 3.6.2

https://bugs.python.org/issue29606
http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection
=> not fixed yet

I also proposed a more general protection: "Reject newline character
(U+000A) in URLs in urllib.parse":
http://bugs.python.org/issue30713

The problem with the urllib module is how we handle invalid URL. Right
now, we return the URL unmodified if we cannot parse it. Should we
raise an exception if an URL contains a newline for example?

It's very hard to harden the urllib module without the backward
compatibility. That's why it took 3 weeks to fix "urllib connects to a
wrong host": find how to fix the vulnerability without brekaing the
backward compatibility.

Another proposed approach is to reject invalid data earlier or later,
but not in urllib...


Checking an URL in urllib.parse is too early and not enough. The urllib 
module is general, and different protocols have different limitations. 
There are other ways besides urllib to pass invalid parameters to 
low-level protocol implementations.


I think the only reliable way of fixing the vulnerability is rejecting 
or escaping (as specified in RFC 2640) CR and LF inside sent lines. 
Adding the support of RFC 2640 is a new feature and can be added only in 
3.7. And this feature should be optional since not all servers support 
RFC 2640. https://github.com/python/cpython/pull/1214 does the right thing.


The other way of hardening the Python stdlib implementation of the FTP 
server is making it accepting only CRLF as a line delimiter, not sole CR 
or LF.


Additional sanity checks can be added in FTP.login() for earlier 
detecting and raising more specific errors.


Every protocol (FTP, HTTP, telnet, SMTP, POP3, IMAP, etc) should be 
fixed separately. If they allow escaping special characters, they should 
do this. Otherwise they should be rejected.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com