Re: [Python-Dev] Python startup time
How implausible is it to write out the actual memory image of a loaded Python process? I.e. on a specific machine, OS, Python version, etc? This can only be overhead initially, of course, but on subsequent runs it's just one memory map, which the cheapest possible operation. E.g. $ python3.7 --write-image "import typing, re, os, numpy" I imagine this creating a file like: /tmp/__python__/python37-typing-re-os-numpy.mem Then just terminating as if just that line had run, however long it takes (but snapshotting before exit). Then subsequent invocations would only restore the image to memory. Maybe: $ pyrunner --load-image python37-typing-re-os-numpy myscript.py The last line could be aliased of course. I suppose we'd need to check if relevant file exists, and if not fall back to just ignoring the '--load-image' flag and running plain old Python. This helps not at all for something like AWS Lambda where each instance is spun up fresh. But for the use-case of running many Python shell commands at an interactive shell on one machine, it seems like that could be very fast. In my hypothetical I suppose pre-loading some collection of modules in the image. Of course, the script may need to load others, and it may not use some in the image. But users could decide their typical needed modules themselves under this idea. On Jul 20, 2017 11:27 PM, "Nick Coghlan" wrote: > On 21 July 2017 at 15:30, Cesare Di Mauro > wrote: > >> >> >> 2017-07-21 4:52 GMT+02:00 Nick Coghlan : >> >>> On 21 July 2017 at 12:44, Nick Coghlan wrote: >>> > We can separately measure the cost of unmarshalling the code object: >>> > >>> > $ python3 -m perf timeit -s "import typing; from marshal import loads; >>> from >>> > importlib.util import cache_from_source; cache = >>> > cache_from_source(typing.__file__); data = open(cache, >>> 'rb').read()[12:]" >>> > "loads(data)" >>> > . >>> > Mean +- std dev: 286 us +- 4 us >>> >>> Slight adjustment here, as the cost of locating the cached bytecode >>> and reading it from disk should really be accounted for in each >>> iteration: >>> >>> $ python3 -m perf timeit -s "import typing; from marshal import loads; >>> from importlib.util import cache_from_source" "cache = >>> cache_from_source(typing.__spec__.origin); data = open(cache, >>> 'rb').read()[12:]; loads(data)" >>> . >>> Mean +- std dev: 337 us +- 8 us >>> >>> That will have a bigger impact when loading from spinning disk or a >>> network drive, but it's fairly negligible when loading from a local >>> SSD or an already primed filesystem cache. >>> >>> Cheers, >>> Nick. >>> >>> -- >>> Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia >>> >> Thanks for your tests, Nick. It's quite evident that the marshal code >> cannot improve the situation, so I regret from my proposal. >> > > It was still a good suggestion, since it made me realise I *hadn't* > actually measured the relative timings lately, so it was technically an > untested assumption that module level code execution still dominated the > overall import time. > > typing is also a particularly large & complex module, and bytecode > unmarshalling represents a larger fraction of the import time for simpler > modules like abc: > > $ python3 -m perf timeit -s "import abc; from marshal import loads; from > importlib.util import cache_from_source" "cache = > cache_from_source(abc.__spec__.origin); data = open(cache, > 'rb').read()[12:]; loads(data)" > . > Mean +- std dev: 45.2 us +- 1.1 us > > $ python3 -m perf timeit -s "import abc; loader_exec = > abc.__spec__.loader.exec_module" "loader_exec(abc)" > . > Mean +- std dev: 172 us +- 5 us > > $ python3 -m perf timeit -s "import abc; from importlib import reload" > "reload(abc)" > . > Mean +- std dev: 280 us +- 14 us > > And _weakrefset: > > $ python3 -m perf timeit -s "import _weakrefset; from marshal import > loads; from importlib.util import cache_from_source" "cache = > cache_from_source(_weakrefset.__spec__.origin); data = open(cache, > 'rb').read()[12:]; loads(data)" > . > Mean +- std dev: 57.7 us +- 1.3 us > > $ python3 -m perf timeit -s "import _weakrefset; loader_exec = > _weakrefset.__spec__.loader.exec_module" "loader_exec(_weakrefset)" > . > Mean +- std dev: 129 us +- 6 us > > $ python3 -m perf timeit -s "import _weakrefset; from importlib import > reload" "reload(_weakrefset)" > . > Mean +- std dev: 226 us +- 4 us > > The conclusion still holds (the absolute numbers here are likely still too > small for the extra complexity of parallelising bytecode loading to pay off > in any significant way), but it also helps us set reasonable expectations > around how much of a gain we're likely to be able to get just from > precompilation with Cython. > > That does actually raise a small microbenchmarking problem: for source and > bytecode imports, we can force the impo
Re: [Python-Dev] Python startup time
On Fri, 21 Jul 2017 00:12:20 -0700 David Mertz wrote: > How implausible is it to write out the actual memory image of a loaded > Python process? I.e. on a specific machine, OS, Python version, etc? This > can only be overhead initially, of course, but on subsequent runs it's just > one memory map, which the cheapest possible operation. You can't rely on the file being remapped at the same address when you reload it. So you'd have to write a relocation routine that's able to find and fix *all* pointers inside the Python object tree and CPython's internal structures (fixing the pointers is not necessarily difficult, finding them without missing any is the difficult part). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python startup time
On Fri, Jul 21, 2017 at 4:12 PM, David Mertz wrote: > How implausible is it to write out the actual memory image of a loaded > Python process? I.e. on a specific machine, OS, Python version, etc? This > can only be overhead initially, of course, but on subsequent runs it's just > one memory map, which the cheapest possible operation. FYI, you may be interested in very recent node.js security issue. https://nodejs.org/en/blog/vulnerability/july-2017-security-releases/#node-js-specific-security-flaws ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Need help to fix urllib(.parse) vulnerabilities
Hi, Recently, two security vulnerabilities were reported in the urllib module: https://bugs.python.org/issue30500 http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html#bpo-30500-urllib-connects-to-a-wrong-host => already fixed in Python 3.6.2 https://bugs.python.org/issue29606 http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection => not fixed yet I also proposed a more general protection: "Reject newline character (U+000A) in URLs in urllib.parse": http://bugs.python.org/issue30713 The problem with the urllib module is how we handle invalid URL. Right now, we return the URL unmodified if we cannot parse it. Should we raise an exception if an URL contains a newline for example? It's very hard to harden the urllib module without the backward compatibility. That's why it took 3 weeks to fix "urllib connects to a wrong host": find how to fix the vulnerability without brekaing the backward compatibility. Another proposed approach is to reject invalid data earlier or later, but not in urllib... So if you understand URLs, HTTP, etc. : please join these issues to help us to fix them! Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities
2017-07-21 12:02 GMT+02:00 Victor Stinner : > https://bugs.python.org/issue29606 > http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection > => not fixed yet Ok, I more concrete problem. To fix the "urllib FTP" bug, we have to find a balance between security (reject any URL looking like an attempt to counter the security protections) and backward compatibility (accept filenames containing newlines). Maybe we need to only reject an URL which contains a newline in the "host" part, but accept them in the "path" part of the URL? The question is if the code splits correctly "host" and "path" parts when the URL contains a newline. My bet is that no, it behaves badly :-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python startup time
On Jul 21 2017, David Mertz wrote: > How implausible is it to write out the actual memory image of a loaded > Python process? That is what Emacs does, and it causes them a lot of trouble. They're trying to move away from it at the moment, but the direction is not yet clear. The keyword is "unexec", and it wrecks havoc with malloc. Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities
On Fri, Jul 21, 2017 at 12:45 PM, Victor Stinner wrote: > 2017-07-21 12:02 GMT+02:00 Victor Stinner : > > https://bugs.python.org/issue29606 > > http://python-security.readthedocs.io/vuln/urllib_ > ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection > > => not fixed yet > > Ok, I more concrete problem. To fix the "urllib FTP" bug, we have to > find a balance between security (reject any URL looking like an > attempt to counter the security protections) and backward > compatibility (accept filenames containing newlines). > > Maybe we need to only reject an URL which contains a newline in the > "host" part, but accept them in the "path" part of the URL? The > question is if the code splits correctly "host" and "path" parts when > the URL contains a newline. My bet is that no, it behaves badly :-) > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/g. > rodola%40gmail.com > It took me a while to understand the security implications of this FTP-related bug, but I believe I got the gist of it here (I can elaborate further if it's not clear): https://github.com/python/cpython/pull/1214#issuecomment-298393169 My proposal is to fix ftplib.py and guard against malicious strings involving the *PORT command only*. This way we fix the issue *and* maintain backward compatibility by allowing users to specify "\n" in their paths and username / password pairs. Java took a different approach and disallowed "\n" completely. To my understanding fixing ftplib would automatically mean fixing urllib as well. -- Giampaolo - http://grodola.blogspot.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities
On Fri, Jul 21, 2017, at 08:43, Giampaolo Rodola' wrote: > It took me a while to understand the security implications of this > FTP-related bug, but I believe I got the gist of it here (I can > elaborate further if it's not clear): > https://github.com/python/cpython/pull/1214#issuecomment-298393169 > My proposal is to fix ftplib.py and guard against malicious > strings involving the *PORT command only*. This way we fix the > issue *and* maintain backward compatibility by allowing users to > specify "\n" in their paths and username / password pairs. Java > took a different approach and disallowed "\n" completely. To my > understanding fixing ftplib would automatically mean fixing urllib > as well. What would a \n in a path mean? What commands would you send over FTP to successfully retrieve a file (or enter a username or password) containing a newline in the name? In other words, what exactly are we being backward compatible *with*? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Summary of Python tracker Issues
ACTIVITY SUMMARY (2017-07-14 - 2017-07-21) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open6058 (+16) closed 36679 (+38) total 42737 (+54) Open issues with patches: 2343 Issues opened (37) == #19896: Exposing "q" and "Q" to multiprocessing.sharedctypes http://bugs.python.org/issue19896 reopened by haypo #30450: Pull Windows dependencies from GitHub rather than svn.python.o http://bugs.python.org/issue30450 reopened by steve.dower #30931: Race condition in asyncore may access the wrong dispatcher http://bugs.python.org/issue30931 opened by walkhour #30934: Document how to run coverage for repository idlelib files. http://bugs.python.org/issue30934 opened by terry.reedy #30935: document the new behavior of get_event_loop() in Python 3.6 http://bugs.python.org/issue30935 opened by chris.jerdonek #30937: csv module examples miss newline='' when opening files http://bugs.python.org/issue30937 opened by Pavel #30938: pdb lacks debugger command to list and show all user-defined v http://bugs.python.org/issue30938 opened by David Rieger #30939: Sphinx 1.6.3 deprecation warning for sphinx.util.compat.Direct http://bugs.python.org/issue30939 opened by ned.deily #30940: Documentation for round() is incorrect. http://bugs.python.org/issue30940 opened by George K #30944: Python 32 bit install fails on Windows - BitDefender false pos http://bugs.python.org/issue30944 opened by Arie van Wingerden #30945: loop.create_server does not detect if the interface is IPv6 en http://bugs.python.org/issue30945 opened by cecton #30947: Update embeded copy of libexpat to 2.2.2 http://bugs.python.org/issue30947 opened by haypo #30949: Provide assertion functions in unittest.mock http://bugs.python.org/issue30949 opened by odd_bloke #30950: Convert round() to Arument Clinic http://bugs.python.org/issue30950 opened by serhiy.storchaka #30951: Documentation error in inspect module http://bugs.python.org/issue30951 opened by jalexvig #30952: include Math extension in SQlite http://bugs.python.org/issue30952 opened by Big Stone #30953: Fatal python error when jumping into except clause http://bugs.python.org/issue30953 opened by ppperry #30956: ftplib behaves oddly if socket timeout is greater than the def http://bugs.python.org/issue30956 opened by arloclarke #30959: Constructor signature is duplicated in the help of namedtuples http://bugs.python.org/issue30959 opened by serhiy.storchaka #30962: Add caching to logging.Logger.isEnabledFor() http://bugs.python.org/issue30962 opened by aviso #30963: xxlimited.c XxoObject_Check should be XxoObject_CheckExact http://bugs.python.org/issue30963 opened by Jim.Jewett #30964: Mention ensurepip in package installation docs http://bugs.python.org/issue30964 opened by ncoghlan #30966: multiprocessing.queues.SimpleQueue leaks 2 fds http://bugs.python.org/issue30966 opened by arigo #30967: Crash in PyThread_ReInitTLS() in the child process after os.fo http://bugs.python.org/issue30967 opened by Thomas Mortensson #30969: Docs should say that `x is z or x == z` is used for `x in y` i http://bugs.python.org/issue30969 opened by ztane #30971: Improve code readability of json.tool http://bugs.python.org/issue30971 opened by dhimmel #30972: Event loop incorrectly inherited in child processes. http://bugs.python.org/issue30972 opened by Elvis.Pranskevichus #30974: Update os.samefile docstring to match documentation http://bugs.python.org/issue30974 opened by eMPee584 #30975: multiprocessing.Condition.notify_all() blocks indefinitely if http://bugs.python.org/issue30975 opened by mickp #30977: reduce uuid.UUID() memory footprint http://bugs.python.org/issue30977 opened by wbolster #30978: str.format_map() silences exceptions in __getitem__ http://bugs.python.org/issue30978 opened by Akuli #30979: the winapi fails to run shortcuts (because considers a shortcu http://bugs.python.org/issue30979 opened by Bernát Gábor #30980: Calling asyncore.file_wrapper.close twice may close unrelated http://bugs.python.org/issue30980 opened by Nir Soffer #30981: IDLE: Add config dialog font page tests http://bugs.python.org/issue30981 opened by terry.reedy #30982: AMD64 Windows8.1 Refleaks 3.x: compilation error, cannot open http://bugs.python.org/issue30982 opened by haypo #30983: eval frame rename in pep 0523 broke gdb's python extension http://bugs.python.org/issue30983 opened by bcap #30984: traceback.print_exc return value documentation http://bugs.python.org/issue30984 opened by Jelle Zijlstra Most recent 15 issues with no replies (15) == #30984: traceback.print_exc return value documentation http://bugs.python.org/issue30984 #30983: eval frame rename in pep 0523 broke gdb's python extension http://bugs.python.org/issue30983 #30980: Calli
Re: [Python-Dev] startup time repeated? why not daemon
On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek wrote: > On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan wrote: > > ... > > * Lazy loading can have a significant impact on startup time, as it > > means you don't have to pay for the cost of finding and loading > > modules that you don't actually end up using on that particular run > It should be mentioned that I have started designing an API to make using lazy loading much easier in Python 3.7 (i.e. "calling a single function" easier), but I still have to write the tests and such before I propose a patch and it will still be mainly for apps that know what they are doing since lazy loading makes debugging import errors harder. > > > > We've historically resisted adopting these techniques for the standard > > library because they *do* make things more complicated *and* harder to > > debug relative to plain old eagerly imported dynamic Python code. > > However, if we're going to recommend them as good practices for 3rd > > party developers looking to optimise the startup time of their Python > > applications, then it makes sense for us to embrace them for the > > standard library as well, rather than having our first reaction be to > > write more hand-crafted C code. > > Are there any good write-ups of best practices and techniques in this > area for applications (other than obvious things like avoiding > unnecessary imports)? I'm thinking of things like how to structure > your project, things to look for, developer tools that might help, and > perhaps third-party runtime libraries? > Nothing beyond "profile your application" and "don't do stuff during import as a side-effect" that I'm aware of. -Brett > > --Chris > > > > > > > On that last point, it's also worth keeping in mind that we have a > > much harder time finding new C-level contributors than we do new > > Python-level ones, and have every reason to expect that problem to get > > worse over time rather than better (since writing and maintaining > > handcrafted C code is likely to go the way of writing and maintaining > > handcrafted assembly code as a skillset: while it will still be > > genuinely necessary in some contexts, it will also be an increasingly > > niche technical specialty). > > > > Starting to migrate to using Cython for our acceleration modules > > instead of plain C should thus prove to be a win for everyone: > > > > - Cython structurally avoids a lot of typical bugs that arise in > > hand-coded extensions (e.g. refcount bugs) > > - by design, it's much easier to mentally switch between Python & > > Cython than it is between Python & C > > - Cython accelerated modules are easier to adapt to other interpeter > > implementations than handcrafted C modules > > - keeping Python modules and their C accelerated counterparts in sync > > will be easier, as they'll mostly be using the same code > > - we'd be able to start writing C API test cases in Cython rather than > > in handcrafted C (which currently mostly translates to only testing > > them indirectly) > > - CPython's own test suite would naturally help test Cython > > compatibility with any C API updates > > - we'd have an inherent incentive to help enhance Cython to take > > advantage of new C API features > > > > The are some genuine downsides in increasing the complexity of > > bootstrapping CPython when all you're starting with is a VCS clone and > > a C compiler, but those complications are ultimately no worse than > > those we already have with Argument Clinic, and hence amenable to the > > same solution: if we need to, we can check in the generated C files in > > order to make bootstrapping easier. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > > ___ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities
> On Jul 21, 2017, at 3:45 AM, Victor Stinner wrote: > > Ok, I more concrete problem. To fix the "urllib FTP" bug, we have to > find a balance between security (reject any URL looking like an > attempt to counter the security protections) and backward > compatibility (accept filenames containing newlines). For this case, the balance should probably tilt more towards security than backwards compatibility. I would be very concerned about such odd URLs. That said, if backwards compatibility is going to be broken, consider giving users a temporary, clean way to opt-out of the additional projections (don't want to leave them high and dry if they happen to have a legitimate use case). Raymond ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Cython compiled stdlib modules - Re: Python startup time
Nick Coghlan schrieb am 21.07.2017 um 08:23: > I'll also note that in these cases where the import overhead is > proportionally significant for always-imported modules, we may want to look > at the benefits of freezing them (if they otherwise remain as pure Python > modules), or compiling them as builtin modules (if we switch them over to > Cython), in addition to looking at ways to make the modules themselves > faster. Just for the sake of it, I gave the Cython compilation a try. I had to apply the attached hack to Lib/typing.py to get the test passing, because it uses frame call offsets in some places and Cython functions do not create frames when being called (they only create them for exception traces). I also had to disable the import of "abc" in the Cython generated module to remove the circular self dependency at startup when the "abc" module is compiled. That shouldn't have an impact on the runtime performance, though. Note that this is otherwise using the unmodified Python code, as provided in the current modules, constructing and using normal Python classes for everything, no extension types etc. Only two stdlib Python modules were compiled into shared libraries, and not statically linked into the CPython core. I used the "python_startup" benchmark in the "performance" project to measure the overall startup times of a clean non-debug non-pgo build of CPython 3.7 (rev d0969d6) against the same build with a compiled typing.py and abc.py. To compile these modules, I used the following command (plus the attached patch) $ cythonize -3 -X binding=True -i Lib/typing.py Lib/abc.py I modified the startup benchmark to run "python -c 'import typing'" etc. instead of just executing "pass". - stock CPython starting up and running "pass": Mean +- std dev: 14.7 ms +- 0.3 ms - stock CPython starting up and running "import abc": Mean +- std dev: 14.8 ms +- 0.3 ms - with compiled abc.py: Mean +- std dev: 14.9 ms +- 0.3 ms - stock CPython starting up and running "import typing": Mean +- std dev: 34.6 ms +- 1.0 ms - with compiled abc.py Mean +- std dev: 34.4 ms +- 0.6 ms - with compiled typing.py: Mean +- std dev: 33.5 ms +- 0.7 ms - with both compiled: Mean +- std dev: 33.1 ms +- 0.4 ms That's only a 4% improvement in the overall startup time on my machine, and about a 7% faster overall runtime of "import typing" compared to "pass". Note also that compiling abc.py leads to a slightly *increased* startup time in the "import abc" case, which might be due to the larger file size of the abc.so file compared to the abc.pyc file. This is amortised by the decreased runtime in the "import typing" case (I guess). I then ran the test suites for both modules in lack of a better post-startup runtime benchmark. The improvement for abc.py is in the order of 1-2%, but test_typing.py has many more tests and wins about 13% overall: - stock CPython executing essentially "runner.run(deepcopy(suite))" in "test_typing.py" (the deepcopy() takes about 6 ms): Mean +- std dev: 68.6 ms +- 0.8 ms - compiled abc.py and typing.py: Mean +- std dev: 60.7 ms +- 0.7 ms One more thing to note: the compiled modules are quite large. I get these file sizes: 8658 Lib/abc.py 7525 Lib/__pycache__/abc.cpython-37.pyc 369930 Lib/abc.c 122048 Lib/abc.cpython-37m-x86_64-linux-gnu.so 80290 Lib/typing.py 73921 Lib/__pycache__/typing.cpython-37.pyc 2951893 Lib/typing.c 1182632 Lib/typing.cpython-37m-x86_64-linux-gnu.so The .so files are about 16x as large as the .pyc files. The typing.so file weighs in with about 40% of the size of the stripped python binary: 2889136 python As it stands, the gain is probably not worth the increase in library file size, which also translates to a higher bottom line for the memory consumption. At least not for these two modules. Manually optimising the files would likely also reduce the .so file size in addition to giving better speedups, though, because the generated code would become less generic. Stefan diff --git a/Lib/typing.py b/Lib/typing.py index c487afc..19a73c3 100644 --- a/Lib/typing.py +++ b/Lib/typing.py @@ -19,6 +19,13 @@ except ImportError: MethodWrapperType = type(object().__str__) MethodDescriptorType = type(str.join) +try: +import cython +except ImportError: +_FRAME_OFFSET = 0 +else: +_FRAME_OFFSET = -1 if cython.compiled else 0 + # Please keep __all__ alphabetized within each category. __all__ = [ @@ -1165,7 +1172,7 @@ class GenericMeta(TypingMeta, abc.ABCMeta): def __subclasscheck__(self, cls): if self.__origin__ is not None: -if sys._getframe(1).f_globals['__name__'] not in ['abc', 'functools']: +if sys._getframe(1 + _FRAME_OFFSET).f_globals['__name__'] not in ['abc', 'functools']: raise TypeError("Parameterized generics cannot be used with class " "or instance checks") return False @@ -2124,7 +2131,7 @@ def _make_nmtuple(name, typ
[Python-Dev] Appending a link back to bugs.python.org in GitHub PRs
Thanks to Kushal Das we now have one of the most requested features since the transition: a link in PRs back to bugs.python.org (in a more discoverable way since we have had them since Bedevere launched :) . When a pull request comes in with an issue number in the title (or one gets added), a link to bugs.python.org will be appended to the PR's body (the message you fill out when creating a PR). There's no logic to remove the link if the issue number is removed from the title, changed, or for multiple issue numbers since basically those cases are all rare and it was easier to launch without that kind of support. P.S.: Berker Peksag is working on providing commit emails with diffs in them which is the other most requested feature since the transition. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python startup time
On Jul 21, 2017, at 01:25 PM, Nikolaus Rath wrote: >That is what Emacs does, and it causes them a lot of trouble. They're >trying to move away from it at the moment, but the direction is not yet >clear. The keyword is "unexec", and it wrecks havoc with malloc. Emacs has been unexec'ing for as long as I can remember (which is longer than I can remember Python :). I know that it's been problematic and there have been many efforts over the years to replace it, but I think it's been a fairly successful technique in practice, at least on platforms that support it. That's another problem with the approach of course; it's not universally possible to implement. -Barry ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] startup time repeated? why not daemon
On Jul 20, 2017, at 03:16 PM, Eric Snow wrote: >Relatedly, at PyCon this year Barry and I were talking about the idea of >bootstrapping the interpreter from a memory snapshot on disk, rather than >from scatch (thus drastically reducing the number of IO events). The TPI (Terrible Python Idea) I had at Pycon was some kind of (local) memcached of imported Python modules, which would theoretically allow avoiding loading the modules from the file system on start up. There would be all kinds of problems with this (i.e. putting the "terrible" in TPI), such as having to deal with module import side-effects, but perhaps those could be handled by enough APIs and engineering. Cheers, -Barry pgpwLxdNAr93a.pgp Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python startup time
Emacs has been unexec'ing for as long as I can remember (which is longer than I can remember Python :). I know that it's been problematic and there have been many efforts over the years to replace it, but I think it's been a fairly successful technique in practice, at least on platforms that support it. I've been using Emacs far longer than Python. I remember having to invoke temacs on something. Still, if I didn't know better, I could be convinced you were referring to the GIL. :-) Skip ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python startup time
I would guess that Windows users don't tend to run lots of command line tools where startup time dominates, as *nix users do. On Fri, Jul 21, 2017 at 3:21 PM, Barry Warsaw wrote: > On Jul 21, 2017, at 01:25 PM, Nikolaus Rath wrote: > > >That is what Emacs does, and it causes them a lot of trouble. They're > >trying to move away from it at the moment, but the direction is not yet > >clear. The keyword is "unexec", and it wrecks havoc with malloc. > > Emacs has been unexec'ing for as long as I can remember (which is longer > than > I can remember Python :). I know that it's been problematic and there have > been many efforts over the years to replace it, but I think it's been a > fairly > successful technique in practice, at least on platforms that support it. > That's another problem with the approach of course; it's not universally > possible to implement. > > -Barry > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > mertz%40gnosis.cx > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] startup time repeated? why not daemon
On 07/21/2017 03:28 PM, Barry Warsaw wrote: The TPI (Terrible Python Idea) I had at Pycon was some kind of (local) memcached of imported Python modules, which would theoretically allow avoiding loading the modules from the file system on start up. There would be all kinds of problems with this (i.e. putting the "terrible" in TPI), such as having to deal with module import side-effects, but perhaps those could be handled by enough APIs and engineering. This would be taking a page out of PHP's book. PHP--or at least, PHP ten years ago--doesn't have the equivalent of .pyc files. If you have mod_php running inside Apache with no other extensions, it literally tokenizes each .php script every time it's invoked. To solve this performance problem, someone wrote the "Alternative PHP Cache" (or "APC"), which runs /in Apache/. (Yep, it's not usable outside Apache!) APC stores the tokenized versions of PHP scripts in something approximating their actual in-memory representation. To use something stored in the cache, you'd iterate over all the variables/functions, copy each one into your local interpreter instance, and perform fixups on all the pointers to convert them from relative to absolute addresses. http://php.net/manual/en/book.apc.php I note that the introduction to APC says: Warning This extension is considered unmaintained and dead. However, the source code for this extension is still available within PECL. So perhaps the PHP folks have moved on from this technique. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] startup time repeated? why not daemon
On Fri, Jul 21, 2017 at 9:52 AM, Brett Cannon wrote: > On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek > wrote: >> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan wrote: >> > ... >> > * Lazy loading can have a significant impact on startup time, as it >> > means you don't have to pay for the cost of finding and loading >> > modules that you don't actually end up using on that particular run > > It should be mentioned that I have started designing an API to make using > lazy loading much easier in Python 3.7 (i.e. "calling a single function" > easier), but I still have to write the tests and such before I propose a > patch and it will still be mainly for apps that know what they are doing > since lazy loading makes debugging import errors harder. > ... >> > However, if we're going to recommend them as good practices for 3rd >> > party developers looking to optimise the startup time of their Python >> > applications, then it makes sense for us to embrace them for the >> > standard library as well, rather than having our first reaction be to >> > write more hand-crafted C code. >> >> Are there any good write-ups of best practices and techniques in this >> area for applications (other than obvious things like avoiding >> unnecessary imports)? I'm thinking of things like how to structure >> your project, things to look for, developer tools that might help, and >> perhaps third-party runtime libraries? > > Nothing beyond "profile your application" and "don't do stuff during import > as a side-effect" that I'm aware of. One "project structure" idea of the sort I had in mind is to move frequently used functions in a module into their own module. This way the most common paths of execution don't load unneeded functions. Following this line of reasoning could lead to grouping functions in an application by when they're needed instead of by what they do, which is different from what we normally see. I don't recall seeing advice like this anywhere, so maybe the trade-offs aren't worth it. Thoughts? --Chris > > -Brett > >> >> >> --Chris >> >> >> >> > >> > On that last point, it's also worth keeping in mind that we have a >> > much harder time finding new C-level contributors than we do new >> > Python-level ones, and have every reason to expect that problem to get >> > worse over time rather than better (since writing and maintaining >> > handcrafted C code is likely to go the way of writing and maintaining >> > handcrafted assembly code as a skillset: while it will still be >> > genuinely necessary in some contexts, it will also be an increasingly >> > niche technical specialty). >> > >> > Starting to migrate to using Cython for our acceleration modules >> > instead of plain C should thus prove to be a win for everyone: >> > >> > - Cython structurally avoids a lot of typical bugs that arise in >> > hand-coded extensions (e.g. refcount bugs) >> > - by design, it's much easier to mentally switch between Python & >> > Cython than it is between Python & C >> > - Cython accelerated modules are easier to adapt to other interpeter >> > implementations than handcrafted C modules >> > - keeping Python modules and their C accelerated counterparts in sync >> > will be easier, as they'll mostly be using the same code >> > - we'd be able to start writing C API test cases in Cython rather than >> > in handcrafted C (which currently mostly translates to only testing >> > them indirectly) >> > - CPython's own test suite would naturally help test Cython >> > compatibility with any C API updates >> > - we'd have an inherent incentive to help enhance Cython to take >> > advantage of new C API features >> > >> > The are some genuine downsides in increasing the complexity of >> > bootstrapping CPython when all you're starting with is a VCS clone and >> > a C compiler, but those complications are ultimately no worse than >> > those we already have with Argument Clinic, and hence amenable to the >> > same solution: if we need to, we can check in the generated C files in >> > order to make bootstrapping easier. >> > >> > Cheers, >> > Nick. >> > >> > -- >> > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia >> > ___ >> > Python-Dev mailing list >> > Python-Dev@python.org >> > https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> > https://mail.python.org/mailman/options/python-dev/chris.jerdonek%40gmail.com >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/brett%40python.org On Fri, Jul 21, 2017 at 9:52 AM, Brett Cannon wrote: > > > On Thu, 20 Jul 2017 at 22:11 Chris Jerdonek > wrote: >> >> On Thu, Jul 20, 2017 at 8:49 PM, Nick Coghlan wrote: >> > ... >> > * Lazy loading can have a significant impact on startup time, as it >> > means you don't have to pay for the cost of finding and load
Re: [Python-Dev] Need help to fix urllib(.parse) vulnerabilities
21.07.17 13:02, Victor Stinner пише: Recently, two security vulnerabilities were reported in the urllib module: https://bugs.python.org/issue30500 http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html#bpo-30500-urllib-connects-to-a-wrong-host => already fixed in Python 3.6.2 https://bugs.python.org/issue29606 http://python-security.readthedocs.io/vuln/urllib_ftp_protocol_stream_injection.html#urllib-ftp-protocol-stream-injection => not fixed yet I also proposed a more general protection: "Reject newline character (U+000A) in URLs in urllib.parse": http://bugs.python.org/issue30713 The problem with the urllib module is how we handle invalid URL. Right now, we return the URL unmodified if we cannot parse it. Should we raise an exception if an URL contains a newline for example? It's very hard to harden the urllib module without the backward compatibility. That's why it took 3 weeks to fix "urllib connects to a wrong host": find how to fix the vulnerability without brekaing the backward compatibility. Another proposed approach is to reject invalid data earlier or later, but not in urllib... Checking an URL in urllib.parse is too early and not enough. The urllib module is general, and different protocols have different limitations. There are other ways besides urllib to pass invalid parameters to low-level protocol implementations. I think the only reliable way of fixing the vulnerability is rejecting or escaping (as specified in RFC 2640) CR and LF inside sent lines. Adding the support of RFC 2640 is a new feature and can be added only in 3.7. And this feature should be optional since not all servers support RFC 2640. https://github.com/python/cpython/pull/1214 does the right thing. The other way of hardening the Python stdlib implementation of the FTP server is making it accepting only CRLF as a line delimiter, not sole CR or LF. Additional sanity checks can be added in FTP.login() for earlier detecting and raising more specific errors. Every protocol (FTP, HTTP, telnet, SMTP, POP3, IMAP, etc) should be fixed separately. If they allow escaping special characters, they should do this. Otherwise they should be rejected. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com