Re: [Python-Dev] PEP 7 and braces { .... } on if
2017-05-31 19:27 GMT+02:00 Guido van Rossum : > I interpret the PEP (...) Right, the phrasing requires to "interpret" it :-) > (...) as saying that you should use braces everywhere but not > to add them in code that you're not modifying otherwise. (I.e. don't go on a > brace-adding rampage.) If author and reviewer of a PR disagree I would go > with "add braces" since that's clearly the PEP's preference. This is C code. > We should play it safe. Would someone be nice enough to try to rephrase the PEP 7 to explain that? Just to avoid further boring discussion on the C coding style... Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] The untuned tunable parameter ARENA_SIZE
When CPython's small block allocator was first merged in late February 2001, it allocated memory in gigantic chunks it called "arenas". These arenas were a massive 256 KILOBYTES apiece. This tunable parameter has not been touched in the intervening 16 years. Yet CPython's memory consumption continues to grow. By the time a current "trunk" build of CPython reaches the REPL prompt it's already allocated 16 arenas. I propose we make the arena size larger. By how much? I asked Victor to run some benchmarks with arenas of 1mb, 2mb, and 4mb. The results with 1mb and 2mb were mixed, but his benchmarks with a 4mb arena size showed measurable (>5%) speedups on ten benchmarks and no slowdowns. What would be the result of making the arena size 4mb? * CPython could no longer run on a computer where at startup it could not allocate at least one 4mb continguous block of memory. * CPython programs would die slightly sooner in out-of-memory conditions. * CPython programs would use more memory. How much? Hard to say. It depends on their allocation strategy. I suspect most of the time it would be < 3mb additional memory. But for pathological allocation strategies the difference could be significant. (e.g: lots of allocs, followed by lots of frees, but the occasional object lives forever, which means that the arena it's in can never be freed. If 1 out of ever 16 256k arenas is kept alive this way, and the objects are spaced out precisely such that now it's 1 for every 4mb arena, max memory use would be the same but later stable memory use would hypothetically be 16x current) * Many programs would be slightly faster now and then, simply because we call malloc() 1/16 as often. What say you? Vote for your favorite color of bikeshed now! //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On 06/01/2017 12:38 AM, Larry Hastings wrote: I propose we make the arena size larger. By how much? I asked Victor to run some benchmarks with arenas of 1mb, 2mb, and 4mb. The results with 1mb and 2mb were mixed, but his benchmarks with a 4mb arena size showed measurable (>5%) speedups on ten benchmarks and no slowdowns. Oh, sorry! Meant to add: thanks, Victor, for running these benchmarks for me! Where are my manners?! //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
2017-06-01 9:38 GMT+02:00 Larry Hastings : > When CPython's small block allocator was first merged in late February 2001, > it allocated memory in gigantic chunks it called "arenas". These arenas > were a massive 256 KILOBYTES apiece. The arena size defines the strict minimum memory usage of Python. With 256 kB, it means that the smallest memory usage is 256 kB. > What would be the result of making the arena size 4mb? A minimum memory usage of 4 MB. It also means that if you allocate 4 MB + 1 byte, Python will eat 8 MB from the operating system. The GNU libc malloc uses a variable threshold to choose between sbrk() (heap memory) or mmap(). It starts at 128 kB or 256 kB, and then is adapted depending on the workload (I don't know how exactly). I would prefer to have an adaptative arena size. For example start at 256 kB and then double the arena size while the memory usage grows. The problem is that pymalloc is currently designed for a fixed arena size. I have no idea how hard it would be to make the size per allocated arena. I already read that CPU support "large pages" between 2 MB and 1 GB, instead of just 4 kB. Using large pages can have a significant impact on performance. I don't know if we can do something to help the Linux kernel to use large pages for our memory? I don't know neither how we could do that :-) Maybe using mmap() closer to large pages will help Linux to join them to build a big page? (Linux has something magic to make applications use big pages transparently.) More generally: I'm strongly in favor of making our memory allocator more efficient :-D When I wrote my tracemalloc PEP 454, I counted that Python calls malloc() , realloc() or free() 270,000 times per second in average when running the Python test suite: https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator (now I don't recall if it was really "malloc" or PyObject_Malloc, but well, we do a lot of memory allocations and deallocations ;-)) When I analyzed the timeline of CPython master performance, I was surprised to see that my change on PyMem_Malloc() to make it use pymalloc was one of the most significant "optimization" of the Python 3.6! http://pyperformance.readthedocs.io/cpython_results_2017.html#pymalloc-allocator The CPython performance heavily depends on the performance of our memory allocator, at least of the performance of pymalloc (the specialized allocator for blocks <= 512 bytes). By the way, Naoki INADA also proposed a different idea: "Global freepool: Many types has it’s own freepool. Sharing freepool can increase memory and cache efficiency. Add PyMem_FastFree(void* ptr, size_t size) to store memory block to freepool, and PyMem_Malloc can check global freepool first." http://faster-cpython.readthedocs.io/cpython37.html IMHO It's also worth it to investigate this change as well. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On 06/01/2017 12:57 AM, Victor Stinner wrote: I would prefer to have an adaptative arena size. For example start at 256 kB and then double the arena size while the memory usage grows. The problem is that pymalloc is currently designed for a fixed arena size. I have no idea how hard it would be to make the size per allocated arena. It's not hard. The major pain point is that it'd make the address_in_range() inline function slightly more expensive. Currently that code has ARENA_SIZE hardcoded inside it; if the size was dynamic we'd have to look up the size of the arena every time. This function is called every time we free a pointer, so it's done hundreds of thousands of times per second (as you point out). It's worth trying the experiment to see if dynamic arena sizes would make programs notably faster. However... why not both? Changing to 4mb arenas now is a one-line change, and on first examination seems mostly harmless, and yields an easy (if tiny) performance win. If someone wants to experiment with dynamic arenas, they could go right ahead, and if it works well we could merge that too. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On Thu, 1 Jun 2017 00:38:09 -0700 Larry Hastings wrote: > * CPython programs would use more memory. How much? Hard to say. It > depends on their allocation strategy. I suspect most of the time it > would be < 3mb additional memory. But for pathological allocation > strategies the difference could be significant. (e.g: lots of > allocs, followed by lots of frees, but the occasional object lives > forever, which means that the arena it's in can never be freed. If > 1 out of ever 16 256k arenas is kept alive this way, and the objects > are spaced out precisely such that now it's 1 for every 4mb arena, > max memory use would be the same but later stable memory use would > hypothetically be 16x current) Yes, this is the same kind of reason the default page size is still 4KB on many platforms today, despite typical memory size having grown by a huge amount. Apart from the cost of fragmentation as you mentioned, another issue is when many small Python processes are running on a machine: a 2MB overhead per process can compound to large numbers if you have many (e.g. hundreds) such processes. I would suggest we exert caution here. Small benchmarks generally have a nice memory behaviour: not only they do not allocate a lot of memory a, but often they will release it all at once after a single run. Perhaps some of those benchmarks would even be better off if we allocated 64MB up front and never released it :-) Long-running applications can be less friendly than that, having various pieces of internal with unpredictable lifetimes (especially when it's talking over the network with other peers which come and go). And long-running applications are typically where Python memory usage is a sensitive matter. If you'd like to go that way anyway, I would suggest 1MB as a starting point in 3.7. > * Many programs would be slightly faster now and then, simply because > we call malloc() 1/16 as often. malloc() you said? Arenas are allocated using mmap() nowadays, right? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
Hello. AFAIK, allocating arena doesn't eat real (physical) memory. * On Windows, VirtualAlloc is used for arena. Real memory page is assigned when the page is used first time. * On Linux and some other *nix, anonymous mmap is used. Real page is assigned when first touch, like Windows. Arena size is more important for **freeing** memory. Python returns memory to system when: 1. When no block in pool is used, it returned to arena. 2. When no pool is used, return the arena to system. So only one memory block can disturb returning the whole arena. Some VMs (e.g. mono) uses special APIs to return "real page" from allocated space. * On Windows, VirtualFree() + VirtualAlloc() can be used to unassign pages. * On Linux, madvice(..., MADV_DONTNEED) can be used. * On other *nix, madvice(..., MADV_DONTNEED) + madvice(..., MADV_FREE) can be used. See also: https://github.com/corngood/mono/blob/ef186403b5e95a5c95c38f1f19d0c8d061f2ac37/mono/utils/mono-mmap.c#L204-L208 (Windows) https://github.com/corngood/mono/blob/ef186403b5e95a5c95c38f1f19d0c8d061f2ac37/mono/utils/mono-mmap.c#L410-L424 (Unix) I think we can return not recently used free pools to system in same way. So more large arena size + more memory efficient can be achieved. But I need more experiment. Regards, ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
2017-06-01 10:19 GMT+02:00 Antoine Pitrou : > Yes, this is the same kind of reason the default page size is still 4KB > on many platforms today, despite typical memory size having grown by a > huge amount. Apart from the cost of fragmentation as you mentioned, > another issue is when many small Python processes are running on a > machine: a 2MB overhead per process can compound to large numbers if > you have many (e.g. hundreds) such processes. > > I would suggest we exert caution here. Small benchmarks generally have > a nice memory behaviour: not only they do not allocate a lot of memory a, > but often they will release it all at once after a single run. Perhaps > some of those benchmarks would even be better off if we allocated 64MB > up front and never released it :-) By the way, the benchmark suite performance supports different ways to trace memory usage: * using tracemalloc * using /proc/pid/smaps * using VmPeak of /proc/pid/status (max RSS memory) I wrote the code but I didn't try it yet :-) Maybe we should check the memory usage before deciding to change the arena size? Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
2017-06-01 10:23 GMT+02:00 INADA Naoki : > AFAIK, allocating arena doesn't eat real (physical) memory. > > * On Windows, VirtualAlloc is used for arena. Real memory page is assigned > when the page is used first time. > * On Linux and some other *nix, anonymous mmap is used. Real page is > assigned when first touch, like Windows. Memory fragmentation is also a real problem in pymalloc. I don't think that pymalloc is designed to reduce the memory fragmentation. I know one worst case: the Python parser which allocates small objects which will be freed when the parser completes, while other objects living longer are created. https://github.com/haypo/misc/blob/master/memory/python_memleak.py In a perfect world, the parser should use a different memory allocator for that. But currently, the Python API doesn't offer this level of granularity. Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] "Global freepool"
On Thu, 1 Jun 2017 09:57:04 +0200 Victor Stinner wrote: > > By the way, Naoki INADA also proposed a different idea: > > "Global freepool: Many types has it’s own freepool. Sharing freepool > can increase memory and cache efficiency. Add PyMem_FastFree(void* > ptr, size_t size) to store memory block to freepool, and PyMem_Malloc > can check global freepool first." This is already exactly how PyObject_Malloc() works. Really, the fast path for PyObject_Malloc() is: size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT; pool = usedpools[size + size]; if (pool != pool->nextpool) { /* * There is a used pool for this size class. * Pick up the head block of its free list. */ ++pool->ref.count; bp = pool->freeblock; assert(bp != NULL); if ((pool->freeblock = *(block **)bp) != NULL) { UNLOCK(); return (void *)bp; // <- fast path! } I don't think you can get much faster than that in a generic allocation routine (unless you have a compacting GC where allocating memory is basically bumping a single global pointer). IMHO the main thing the private freelists have is that they're *private* precisely, so they can avoid a couple of conditional branches. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On 06/01/2017 01:19 AM, Antoine Pitrou wrote: If you'd like to go that way anyway, I would suggest 1MB as a starting point in 3.7. I understand the desire for caution. But I was hoping maybe we could experiment with 4mb in trunk for a while? We could change it to 1mb--or even 256k--before beta 1 if we get anxious. * Many programs would be slightly faster now and then, simply because we call malloc() 1/16 as often. malloc() you said? Arenas are allocated using mmap() nowadays, right? malloc() and free(). See _PyObject_ArenaMalloc (etc) in Objects/obmalloc.c. On Windows Python uses VirtualAlloc(), and I don't know what the implications are of that. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On 06/01/2017 01:41 AM, Larry Hastings wrote: On 06/01/2017 01:19 AM, Antoine Pitrou wrote: malloc() you said? Arenas are allocated using mmap() nowadays, right? malloc() and free(). See _PyObject_ArenaMalloc (etc) in Objects/obmalloc.c. Oh, sorry, I forgot how to read. If ARENAS_USE_MMAP is on it uses mmap(). I can't figure out when or how MAP_ANONYMOUS gets set, but if I step into the _PyObject_Arena.alloc() it indeed calls _PyObject_ArenaMmap() which uses mmap(). So, huzzah!, we use mmap() to allocate our enormous 256kb arenas. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On Thu, 1 Jun 2017 01:41:15 -0700 Larry Hastings wrote: > On 06/01/2017 01:19 AM, Antoine Pitrou wrote: > > If you'd like to go that way anyway, I would suggest 1MB as a starting > > point in 3.7. > > I understand the desire for caution. But I was hoping maybe we could > experiment with 4mb in trunk for a while? We could change it to 1mb--or > even 256k--before beta 1 if we get anxious. Almost nobody tests "trunk" (or "master" :-)) on production systems. At best a couple rare open source projects will run their test suite on the pre-release betas, but that's all. So we are unlikely to spot memory usage ballooning problems before the final release. > >>* Many programs would be slightly faster now and then, simply because > >> we call malloc() 1/16 as often. > > malloc() you said? Arenas are allocated using mmap() nowadays, right? > > malloc() and free(). See _PyObject_ArenaMalloc (etc) in Objects/obmalloc.c. _PyObject_ArenaMalloc should only be used if the OS doesn't support mmap() or MAP_ANONYMOUS (see ARENAS_USE_MMAP). Otherwise _PyObject_ArenaMmap is used. Apparently OS X doesn't have MAP_ANONYMOUS but it has the synonymous MAP_ANON: https://github.com/HaxeFoundation/hashlink/pull/12 Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
Hi, As you said, I think PyObject_Malloc() is fast enough. But PyObject_Free() is somewhat complex. Actually, there are some freelists (e.g. tuple, dict, frame) and they improve performance significantly. My "global unified freelist" idea is unify them. The merit is: * Unify _PyXxx_DebugMallocStats(). Some freelists provide it but some doesn't. * Unify PyXxx_ClearFreeList(). Some freelists doesn't provide it and it may disturb returning memory to system. * Potential better CPU cache hit ratio by unifying LRU if some freelists has same memory block size. This idea is partially implemented in https://github.com/methane/cpython/pull/3 But there are no significant difference about speed or memory usage. Regards, On Thu, Jun 1, 2017 at 5:40 PM, Antoine Pitrou wrote: > On Thu, 1 Jun 2017 09:57:04 +0200 > Victor Stinner wrote: >> >> By the way, Naoki INADA also proposed a different idea: >> >> "Global freepool: Many types has it’s own freepool. Sharing freepool >> can increase memory and cache efficiency. Add PyMem_FastFree(void* >> ptr, size_t size) to store memory block to freepool, and PyMem_Malloc >> can check global freepool first." > > This is already exactly how PyObject_Malloc() works. Really, the fast > path for PyObject_Malloc() is: > > size = (uint)(nbytes - 1) >> ALIGNMENT_SHIFT; > pool = usedpools[size + size]; > if (pool != pool->nextpool) { > /* > * There is a used pool for this size class. > * Pick up the head block of its free list. > */ > ++pool->ref.count; > bp = pool->freeblock; > assert(bp != NULL); > if ((pool->freeblock = *(block **)bp) != NULL) { > UNLOCK(); > return (void *)bp; // <- fast path! > } > > > I don't think you can get much faster than that in a generic allocation > routine (unless you have a compacting GC where allocating memory is > basically bumping a single global pointer). IMHO the main thing the > private freelists have is that they're *private* precisely, so they can > avoid a couple of conditional branches. > > Regards > > Antoine. > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Wed, 31 May 2017 14:09:20 -0600 Jim Baker wrote: > > But I object to a completely new feature being added to 2.7 to support the > implementation of event loop SSL usage. This feature cannot be construed as > a security fix, and therefore does not qualify as a feature that can be > added to CPython 2.7 at this point in its lifecycle. I agree with this sentiment. Also see comments by Ben Darnell and others here: https://github.com/python/peps/pull/272#pullrequestreview-41388700 Moreover I think that a 2.7 policy decision shouldn't depend on whatever future plans there are for Requests. The slippery slope of relaxing maintenance policy on 2.7 has come to absurd extremities. If Requests is to remain 2.7-compatible, it's up to Requests to do the necessary work to do so. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
2017-06-01 10:41 GMT+02:00 Larry Hastings : > On 06/01/2017 01:19 AM, Antoine Pitrou wrote: > If you'd like to go that way anyway, I would suggest 1MB as a starting > point in 3.7. > > I understand the desire for caution. But I was hoping maybe we could > experiment with 4mb in trunk for a while? We could change it to 1mb--or > even 256k--before beta 1 if we get anxious. While I fail to explain why in depth, I would prefer to *not* touch the default arena size at this point. We need more data, for example measure the memory usage on different workloads using different arena sizes. It's really hard to tune a memory allocator for *any* use cases. A simple enhancement would be to add an environment variable to change the arena size at Python startup. Example: PYTHONARENASIZE=1M. If you *know* that your application will allocate at least 2 GB, you may even want to try PYTHONARENASIZE=1G which is more likely to use a single large page... Such parameter cannot be used by default: it would make the default Python memory usage insane ;-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
2017-06-01 10:57 GMT+02:00 Antoine Pitrou : > If Requests is to remain 2.7-compatible, it's up to Requests to do the > necessary work to do so. In practice, CPython does include Requests in ensurepip. Because of that, it means that Requests cannot use any C extension. CPython 2.7 ensurepip prevents evolutions of Requests on Python 3.7. Is my rationale broken somehow? The root issue is to get a very secure TLS connection in pip to download packages from pypi.python.org. On CPython 3.6, we made multiple small steps to include more and more features in the stdlib ssl module, but I understand that the lack of root certificate authorities (CA) on Windows and macOS is still a major blocker issue for pip. That's why pip uses Requests which uses certifi (Mozilla bundled root certificate authorities.) pip and so Requests are part of the current success of the Python community. I disagree that Requests pratical isssues are not our problems. -- Moreover, the PEP 546 Rationale not only include Requests, but also the important PEP 543 to make CPython 3.7 more secure in the long term. Do you also disagree on the need of the need of the PEP 546 (backport) to make the PEP 543 (new TLS API) feasible in practice? Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
> * On Linux, madvice(..., MADV_DONTNEED) can be used. Recent Linux has MADV_FREE. It is faster than MADV_DONTNEED, https://lwn.net/Articles/591214/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
2017-06-01 10:40 GMT+02:00 Antoine Pitrou : > This is already exactly how PyObject_Malloc() works. (...) Oh ok, good to know... > IMHO the main thing the > private freelists have is that they're *private* precisely, so they can > avoid a couple of conditional branches. I would like to understand how private free lists are "so much" faster. In fact, I don't recall if someone *measured* the performance speedup of these free lists :-) By the way, the Linux kernel uses a "SLAB" allocator for the most common object types like inode. I'm curious to know if CPython would benefit of a similar allocator for our most common object types? For example types which already use a free list. https://en.wikipedia.org/wiki/Slab_allocation Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
Thanks for detailed info. But I don't think it's a big problem. Arenas are returned to system by chance. So other processes shouldn't relying to it. And I don't propose to stop returning arena to system. I just mean per pool (part of arena) MADV_DONTNEED can reduce RSS. If we use very large arena, or stop returning arena to system, it can be problem. Regards, On Thu, Jun 1, 2017 at 6:05 PM, Siddhesh Poyarekar wrote: > On Thursday 01 June 2017 01:53 PM, INADA Naoki wrote: >> * On Linux, madvice(..., MADV_DONTNEED) can be used. > > madvise does not reduce the commit charge in the Linux kernel, so in > high consumption scenarios (and where memory overcommit is disabled or > throttled) you'll see programs dying with OOM despite the MADV_DONTNEED. > The way we solved it in glibc was to use mprotect to drop PROT_READ and > PROT_WRITE in blocks that we don't need when we detect that the system > is not configured to overcommit (using /proc/sys/vm/overcommit_memory). > You'll need to fix the protection again though if you want to reuse the > block. > > Siddhesh ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
Le 01/06/2017 à 11:13, Victor Stinner a écrit : > That's why pip uses Requests which uses certifi (Mozilla > bundled root certificate authorities.) pip could use certifi without using Requests. My guess is that Requests is used mostly because it eases coding. > pip and so Requests are part of the current success of the Python > community. pip is, but I'm not convinced about Requests. If Requests didn't exist, people (including pip's developers) would use another HTTP-fetching library, they wouldn't switch to Go or Ruby. > Do you also disagree on the need of the need of the PEP 546 > (backport) to make the PEP 543 (new TLS API) feasible in practice? Yes, I disagree. We needn't backport that new API to Python 2.7. Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only mode for a very long time. Python 3.6 is out. We should move on. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On Thu, Jun 1, 2017 at 10:45 AM, Larry Hastings wrote: > On 06/01/2017 01:41 AM, Larry Hastings wrote: > > On 06/01/2017 01:19 AM, Antoine Pitrou wrote: > > malloc() you said? Arenas are allocated using mmap() nowadays, right? > > malloc() and free(). See _PyObject_ArenaMalloc (etc) in > Objects/obmalloc.c. > > > Oh, sorry, I forgot how to read. If ARENAS_USE_MMAP is on it uses > mmap(). I can't figure out when or how MAP_ANONYMOUS gets set, > MAP_ANONYMOUS is set by sys/mman.h (where the system supports it), just like the other MAP_* defines. > but if I step into the _PyObject_Arena.alloc() it indeed calls > _PyObject_ArenaMmap() which uses mmap(). So, huzzah!, we use mmap() to > allocate our enormous 256kb arenas. > > > */arry* > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > thomas%40python.org > > -- Thomas Wouters Hi! I'm an email virus! Think twice before sending your email to help me spread! ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
x86's hugepage is 2MB. And some Linux enables "Transparent Huge Page" feature. Maybe, 2MB arena size is better for TLB efficiency. Especially, for servers having massive memory. On Thu, Jun 1, 2017 at 4:38 PM, Larry Hastings wrote: > > > When CPython's small block allocator was first merged in late February 2001, > it allocated memory in gigantic chunks it called "arenas". These arenas > were a massive 256 KILOBYTES apiece. > > This tunable parameter has not been touched in the intervening 16 years. > Yet CPython's memory consumption continues to grow. By the time a current > "trunk" build of CPython reaches the REPL prompt it's already allocated 16 > arenas. > > I propose we make the arena size larger. By how much? I asked Victor to > run some benchmarks with arenas of 1mb, 2mb, and 4mb. The results with 1mb > and 2mb were mixed, but his benchmarks with a 4mb arena size showed > measurable (>5%) speedups on ten benchmarks and no slowdowns. > > What would be the result of making the arena size 4mb? > > CPython could no longer run on a computer where at startup it could not > allocate at least one 4mb continguous block of memory. > CPython programs would die slightly sooner in out-of-memory conditions. > CPython programs would use more memory. How much? Hard to say. It depends > on their allocation strategy. I suspect most of the time it would be < 3mb > additional memory. But for pathological allocation strategies the > difference could be significant. (e.g: lots of allocs, followed by lots of > frees, but the occasional object lives forever, which means that the arena > it's in can never be freed. If 1 out of ever 16 256k arenas is kept alive > this way, and the objects are spaced out precisely such that now it's 1 for > every 4mb arena, max memory use would be the same but later stable memory > use would hypothetically be 16x current) > Many programs would be slightly faster now and then, simply because we call > malloc() 1/16 as often. > > > What say you? Vote for your favorite color of bikeshed now! > > > /arry > > > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
For the ARENA_SIZE, will that be better to setting by ./configure first, and without hard code in c files? 2017-06-01 17:37 GMT+08:00 INADA Naoki : > x86's hugepage is 2MB. > And some Linux enables "Transparent Huge Page" feature. > > Maybe, 2MB arena size is better for TLB efficiency. > Especially, for servers having massive memory. > > > On Thu, Jun 1, 2017 at 4:38 PM, Larry Hastings wrote: >> >> >> When CPython's small block allocator was first merged in late February 2001, >> it allocated memory in gigantic chunks it called "arenas". These arenas >> were a massive 256 KILOBYTES apiece. >> >> This tunable parameter has not been touched in the intervening 16 years. >> Yet CPython's memory consumption continues to grow. By the time a current >> "trunk" build of CPython reaches the REPL prompt it's already allocated 16 >> arenas. >> >> I propose we make the arena size larger. By how much? I asked Victor to >> run some benchmarks with arenas of 1mb, 2mb, and 4mb. The results with 1mb >> and 2mb were mixed, but his benchmarks with a 4mb arena size showed >> measurable (>5%) speedups on ten benchmarks and no slowdowns. >> >> What would be the result of making the arena size 4mb? >> >> CPython could no longer run on a computer where at startup it could not >> allocate at least one 4mb continguous block of memory. >> CPython programs would die slightly sooner in out-of-memory conditions. >> CPython programs would use more memory. How much? Hard to say. It depends >> on their allocation strategy. I suspect most of the time it would be < 3mb >> additional memory. But for pathological allocation strategies the >> difference could be significant. (e.g: lots of allocs, followed by lots of >> frees, but the occasional object lives forever, which means that the arena >> it's in can never be freed. If 1 out of ever 16 256k arenas is kept alive >> this way, and the objects are spaced out precisely such that now it's 1 for >> every 4mb arena, max memory use would be the same but later stable memory >> use would hypothetically be 16x current) >> Many programs would be slightly faster now and then, simply because we call >> malloc() 1/16 as often. >> >> >> What say you? Vote for your favorite color of bikeshed now! >> >> >> /arry >> >> >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com >> > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/me%40louie.lu ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On Thu, 1 Jun 2017 18:37:17 +0900 INADA Naoki wrote: > x86's hugepage is 2MB. > And some Linux enables "Transparent Huge Page" feature. > > Maybe, 2MB arena size is better for TLB efficiency. > Especially, for servers having massive memory. But, since Linux is able to merge pages transparently, we perhaps needn't allocate large pages explicitly. Another possible strategy is: allocate several arenas at once (using a larger mmap() call), and use MADV_DONTNEED to relinquish individual arenas. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, Jun 1, 2017 at 7:23 PM, Antoine Pitrou wrote: >> Do you also disagree on the need of the need of the PEP 546 >> (backport) to make the PEP 543 (new TLS API) feasible in practice? > > Yes, I disagree. We needn't backport that new API to Python 2.7. > Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only > mode for a very long time. Python 3.6 is out. We should move on. But it is in *security fix* mode for at least another three years (ish). Proper use of TLS certificates is a security question. How hard would it be for the primary codebase of Requests to be written to use a C extension, but with a fallback *for pip's own bootstrapping only* that provides one single certificate - the authority that signs pypi.python.org? The point of the new system is that back-ends can be switched out; a stub back-end that authorizes only one certificate would theoretically be possible, right? Or am I completely misreading which part needs C? ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 19:50:22 +1000 Chris Angelico wrote: > On Thu, Jun 1, 2017 at 7:23 PM, Antoine Pitrou wrote: > >> Do you also disagree on the need of the need of the PEP 546 > >> (backport) to make the PEP 543 (new TLS API) feasible in practice? > > > > Yes, I disagree. We needn't backport that new API to Python 2.7. > > Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only > > mode for a very long time. Python 3.6 is out. We should move on. > > But it is in *security fix* mode for at least another three years > (ish). Proper use of TLS certificates is a security question. Why are you bringing "proper use of TLS certificates"? Python 2.7 doesn't need another backport for that. The certifi package is available for Python 2.7 and can be integrated simply with the existing ssl module. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, Jun 1, 2017 at 8:01 PM, Antoine Pitrou wrote: > On Thu, 1 Jun 2017 19:50:22 +1000 > Chris Angelico wrote: >> On Thu, Jun 1, 2017 at 7:23 PM, Antoine Pitrou wrote: >> >> Do you also disagree on the need of the need of the PEP 546 >> >> (backport) to make the PEP 543 (new TLS API) feasible in practice? >> > >> > Yes, I disagree. We needn't backport that new API to Python 2.7. >> > Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only >> > mode for a very long time. Python 3.6 is out. We should move on. >> >> But it is in *security fix* mode for at least another three years >> (ish). Proper use of TLS certificates is a security question. > > Why are you bringing "proper use of TLS certificates"? Python 2.7 > doesn't need another backport for that. The certifi package is > available for Python 2.7 and can be integrated simply with the existing > ssl module. As stated in this thread, OS-provided certificates are not handled by that. For instance, if a local administrator distributes a self-signed cert for the intranet server, web browsers will use it, but pip will not. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
I thought pymalloc is SLAB allocator. What is the difference between SLAB and pymalloc allocator? On Thu, Jun 1, 2017 at 6:20 PM, Victor Stinner wrote: > 2017-06-01 10:40 GMT+02:00 Antoine Pitrou : >> This is already exactly how PyObject_Malloc() works. (...) > > Oh ok, good to know... > >> IMHO the main thing the >> private freelists have is that they're *private* precisely, so they can >> avoid a couple of conditional branches. > > I would like to understand how private free lists are "so much" > faster. In fact, I don't recall if someone *measured* the performance > speedup of these free lists :-) > > By the way, the Linux kernel uses a "SLAB" allocator for the most > common object types like inode. I'm curious to know if CPython would > benefit of a similar allocator for our most common object types? For > example types which already use a free list. > > https://en.wikipedia.org/wiki/Slab_allocation > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
01.06.17 12:20, Victor Stinner пише: 2017-06-01 10:40 GMT+02:00 Antoine Pitrou : This is already exactly how PyObject_Malloc() works. (...) Oh ok, good to know... IMHO the main thing the private freelists have is that they're *private* precisely, so they can avoid a couple of conditional branches. I would like to understand how private free lists are "so much" faster. In fact, I don't recall if someone *measured* the performance speedup of these free lists :-) I measured the performance boost of adding the free list for dict keys structures. [1] This proposition was withdraw in the favor of using PyObject_Malloc(). The latter solution is slightly slower, but simpler. But even private free lists are not fast enough. That is why some functions (zip, dict.items iterator, property getter, etc) have private caches for tuples and the FASTCALL protocol added so much speedup. At end we have multiple levels of free lists and caches, and every level adds good speedup (otherwise it wouldn't used). I also have found many times is spent in dealloc functions for tuples called before placing an object back in a free list or memory pool. They use the trashcan mechanism for guarding from stack overflow, and it is costly in comparison with clearing 1-element tuple. [1] https://bugs.python.org/issue16465 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 20:05:48 +1000 Chris Angelico wrote: > > As stated in this thread, OS-provided certificates are not handled by > that. For instance, if a local administrator distributes a self-signed > cert for the intranet server, web browsers will use it, but pip will > not. That's true. But: 1) pip could grow a config entry to set an alternative or additional CA path 2) it is not a "security fix", as not being able to recognize privately-signed certificates is not a security breach. It's a new feature Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 10:23, Antoine Pitrou wrote: > > Yes, I disagree. We needn't backport that new API to Python 2.7. > Perhaps it's time to be reasonable: Python 2.7 has been in bugfix-only > mode for a very long time. Python 3.6 is out. We should move on. Who is the “we” that should move on? Python core dev? Or the Python ecosystem? Because if it’s the latter, then I’m going to tell you right now that the ecosystem did not get the memo. If you check the pip download numbers for Requests in the last month you’ll see that 80% of our downloads (9.4 million) come from Python 2. That is an enormous proportion: far too many to consider not supporting that user-base. So Requests is basically bound to support that userbase. Requests is stuck in a place from which it cannot move. We feel we cannot drop 2.7 support. We want to support as many TLS backends as possible. We want to enable the pip developers to focus on their features, rather than worrying about HTTP and TLS. And we want people to adopt the async/await keywords as much as possible. It turns out that we cannot satisfy all of those desires with the status quo, so we proposed an alternative that involves backporting MemoryBIO. So, to the notion of “we need to move on”, I say this: we’re trying. We really, genuinely, are. I don’t know how much stronger of a signal I can give about how much Requests cares about Python 3 than to signal that we’re trying to adopt async/await and be compatible with asyncio. I believe that Python 3 is the present and future of this language. But right now, we can’t properly adopt it because we have a userbase that you want to leave behind, and we don’t. I want to move on, but I want to bring that 80% of our userbase with us when we do. My reading of your post is that you would rather Requests not adopt the async/await paradigm than backport MemoryBIO: is my understanding correct? If so, fair enough. If not, I’d like to try to work with you to a place where we can all get what we want. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
Le 01/06/2017 à 12:23, Cory Benfield a écrit : > > No it can’t. > > OpenSSL builds chains differently, and disregards some metadata that Windows > and macOS store, which means that cert validation will work differently than > in the system store. This can lead to pip accepting a cert marked as > “untrusted for SSL”, for example, which would be pretty bad. Are you claiming that OpenSSL certificate validation is insecure and shouldn't be used at all? I have never heard that claim before. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 11:18, Antoine Pitrou wrote: > > On Thu, 1 Jun 2017 20:05:48 +1000 > Chris Angelico wrote: >> >> As stated in this thread, OS-provided certificates are not handled by >> that. For instance, if a local administrator distributes a self-signed >> cert for the intranet server, web browsers will use it, but pip will >> not. > > That's true. But: > 1) pip could grow a config entry to set an alternative or additional CA > path No it can’t. Exporting the Windows or macOS security store to a big file of PEM is a security vulnerability because the macOS and Windows security stores expect to work with their own certificate chain building algorithms. OpenSSL builds chains differently, and disregards some metadata that Windows and macOS store, which means that cert validation will work differently than in the system store. This can lead to pip accepting a cert marked as “untrusted for SSL”, for example, which would be pretty bad. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
Hi Cory, On Thu, Jun 01, 2017 at 11:22:21AM +0100, Cory Benfield wrote: > We want to support as many TLS backends as possible. Just a wild idea, but have you investigated a pure-Python fallback for 2.7 such as TLSlite? Of course the fallback need only be used during bootstrapping, and the solution would be compatible with every stable LTS Linux distribution release that was not shipping the latest and greatest 2.7. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 11:22:21 +0100 Cory Benfield wrote: > > Who is the “we” that should move on? Python core dev? Or the Python ecosystem? Sorry. Python core dev certainly. As for the rest of the ecosystem, it is moving on as well. > Requests is stuck in a place from which it cannot move. > We feel we cannot drop 2.7 support. We want to support as many TLS > backends as possible. Well, certain features could be 3.x-only, couldn't they? > We want to enable the pip developers to focus on > their features, rather than worrying about HTTP and TLS. And we want > people to adopt the async/await keywords as much as possible. I don't get what async/await keywords have to do with this. We're talking about backporting the ssl memory BIO object... (also, as much as I think asyncio is a good thing, I'm not sure it will do much for the problem of downloading packages from HTTP, even in parallel) > I want to move on, but I want to bring that 80% of our userbase with us when > we do. My reading of your post is that you would rather Requests not adopt > the async/await paradigm than backport MemoryBIO: is my understanding correct? Well you cannot use async/await on 2.7 in any case, and you cannot use asyncio on 2.7 (Trollius, which was maintained by Victor, has been abandoned AFAIK). If you want to use coroutines in 2.7, you need to use Tornado or Twisted. Twisted may not, but Tornado works fine with the stdlib ssl module. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 11:28, Antoine Pitrou wrote: > > > Le 01/06/2017 à 12:23, Cory Benfield a écrit : >> >> No it can’t. >> >> OpenSSL builds chains differently, and disregards some metadata that Windows >> and macOS store, which means that cert validation will work differently than >> in the system store. This can lead to pip accepting a cert marked as >> “untrusted for SSL”, for example, which would be pretty bad. > > Are you claiming that OpenSSL certificate validation is insecure and > shouldn't be used at all? I have never heard that claim before. Of course I’m not. I am claiming that using OpenSSL certificate validation with root stores that are not intended for OpenSSL can be. This is because trust of a certificate is non-binary. For example, consider WoSign. The Windows TLS implementation will distrust certificates that chain up to WoSign as a root certificate that were issued after October 21 2016. This is not something that can currently be represented as a PEM file. Therefore, the person exporting the certs needs to choose: should that be exported or not? If it is, then OpenSSL will happily trust it even in situations where the system trust store would not. More generally, macOS allows the administrator to configure graduated trust: that is, to override whether or not a root should be trusted for certificate validation in some circumstances. Again, exporting this to a PEM does not persist this information. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 11:39, David Wilson wrote: > > Hi Cory, > > On Thu, Jun 01, 2017 at 11:22:21AM +0100, Cory Benfield wrote: > >> We want to support as many TLS backends as possible. > > Just a wild idea, but have you investigated a pure-Python fallback for > 2.7 such as TLSlite? Of course the fallback need only be used during > bootstrapping, and the solution would be compatible with every stable > LTS Linux distribution release that was not shipping the latest and > greatest 2.7. I have, but discarded the idea. There are no pure-Python TLS implementations that are both feature-complete and actively maintained. Additionally, doing crypto operations in pure-Python is a bad idea, so any implementation that did crypto in Python code would be ruled out immediately (which rules out TLSLite), so I’d need what amounts to a custom library: pure-Python TLS with crypto from OpenSSL, which is not currently exposed by any Python module. Ultimately it’s just not a winner. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 11:45:14 +0100 Cory Benfield wrote: > > I am claiming that using OpenSSL certificate validation with root stores that > are not intended for OpenSSL can be. This is because trust of a certificate > is non-binary. For example, consider WoSign. The Windows TLS implementation > will distrust certificates that chain up to WoSign as a root certificate that > were issued after October 21 2016. This is not something that can currently > be represented as a PEM file. Therefore, the person exporting the certs needs > to choose: should that be exported or not? If it is, then OpenSSL will > happily trust it even in situations where the system trust store would not. I was not talking about exporting the whole system CA as a PEM file, I was talking about adding an option for system adminstrators to configure an extra CA certificate to be recognized by pip. > More generally, macOS allows the administrator to configure graduated trust: > that is, to override whether or not a root should be trusted for certificate > validation in some circumstances. Again, exporting this to a PEM does not > persist this information. How much of this is relevant to pip? Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 11:39, Antoine Pitrou wrote: > > On Thu, 1 Jun 2017 11:22:21 +0100 > Cory Benfield wrote: >> >> Who is the “we” that should move on? Python core dev? Or the Python >> ecosystem? > > Sorry. Python core dev certainly. As for the rest of the ecosystem, it > is moving on as well. Moving, sure, but slowly. Again, I point to the 80% download number. >> Requests is stuck in a place from which it cannot move. >> We feel we cannot drop 2.7 support. We want to support as many TLS >> backends as possible. > > Well, certain features could be 3.x-only, couldn't they? In principle, sure. In practice, that means most of our users don’t use those features and so we don’t get any feedback on whether they’re good solutions to the problem. This is not great. Ideally we want features to be available across as wide a deploy base as possible, otherwise we risk shipping features that don’t solve the actual problem very well. Good software comes, in part, from getting user feedback. >> We want to enable the pip developers to focus on >> their features, rather than worrying about HTTP and TLS. And we want >> people to adopt the async/await keywords as much as possible. > > I don't get what async/await keywords have to do with this. We're > talking about backporting the ssl memory BIO object… All of this is related. I wrote a very, very long email initially and deleted it all because it was just too long to expect any normal human being to read it, but the TL;DR here is that we also want to support async/await, and doing so requires a memory BIO object. >> I want to move on, but I want to bring that 80% of our userbase with us when >> we do. My reading of your post is that you would rather Requests not adopt >> the async/await paradigm than backport MemoryBIO: is my understanding >> correct? > > Well you cannot use async/await on 2.7 in any case, and you cannot use > asyncio on 2.7 (Trollius, which was maintained by Victor, has been > abandoned AFAIK). If you want to use coroutines in 2.7, you need to > use Tornado or Twisted. Twisted may not, but Tornado works fine with > the stdlib ssl module. I can use Twisted on 2.7, and Twisted has great integration with async/await and asyncio when they are available. Great and getting greater, in fact, thanks to the work of the Twisted and asyncio teams. As to Tornado, the biggest concern there is that there is no support for composing the TLS over non-TCP sockets as far as I am aware. The wrapped socket approach is not suitable for some kinds of stream-based I/O that users really should be able to use with Requests (e.g. UNIX pipes). Not a complete non-starter, but also not something I’d like to forego. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 11:51, Antoine Pitrou wrote: > > On Thu, 1 Jun 2017 11:45:14 +0100 > Cory Benfield wrote: >> >> I am claiming that using OpenSSL certificate validation with root stores >> that are not intended for OpenSSL can be. This is because trust of a >> certificate is non-binary. For example, consider WoSign. The Windows TLS >> implementation will distrust certificates that chain up to WoSign as a root >> certificate that were issued after October 21 2016. This is not something >> that can currently be represented as a PEM file. Therefore, the person >> exporting the certs needs to choose: should that be exported or not? If it >> is, then OpenSSL will happily trust it even in situations where the system >> trust store would not. > > I was not talking about exporting the whole system CA as a PEM file, I > was talking about adding an option for system adminstrators to > configure an extra CA certificate to be recognized by pip. Generally speaking system administrators aren’t wild about this option, as it means that they can only add to the trust store, not remove from it. So, while possible, it’s not a complete solution to this issue. I say this because the option *already* exists, at least in part, via the REQUESTS_CA_BUNDLE environment variable, and we nonetheless still get many complaints from system administrators. >> More generally, macOS allows the administrator to configure graduated trust: >> that is, to override whether or not a root should be trusted for certificate >> validation in some circumstances. Again, exporting this to a PEM does not >> persist this information. > > How much of this is relevant to pip? Depends. If the design goal is “pip respects the system administrator”, then the answer is “all of it”. An administrator wants to be able to configure their system trust settings. Ideally they want to do this once, and once only, such that all applications on their system respect it. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, Jun 01, 2017 at 11:47:31AM +0100, Cory Benfield wrote: > I have, but discarded the idea. I'm glad to hear it was given sufficent thought. :) I have one final 'crazy' idea, and actually it does not seem to bad at all: can't you just fork a subprocess or spawn threads to handle the blocking SSL APIs? Sure it wouldn't be beautiful, but it is more appealing than forcing an upgrade on all 2.7 users just so they can continue to use pip. (Which, ironically, seems to resonate strongly with the motivation behind all of this work -- allowing users to continue with their old environments without forcing an upgrade to 3.x!) David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 12:09, David Wilson wrote: > > On Thu, Jun 01, 2017 at 11:47:31AM +0100, Cory Benfield wrote: > >> I have, but discarded the idea. > > I'm glad to hear it was given sufficent thought. :) > > I have one final 'crazy' idea, and actually it does not seem to bad at > all: can't you just fork a subprocess or spawn threads to handle the > blocking SSL APIs? > > Sure it wouldn't be beautiful, but it is more appealing than forcing an > upgrade on all 2.7 users just so they can continue to use pip. (Which, > ironically, seems to resonate strongly with the motivation behind all of > this work -- allowing users to continue with their old environments > without forcing an upgrade to 3.x!) So, this will work, but at a performance and code cleanliness cost. This essentially becomes a Python-2-only code-path, and a very large and complex one at that. This has the combined unfortunate effects of meaning a) a proportionally small fraction of our users get access to the code path we want to take forward into the future, and b) the majority of our users get an inferior experience of having a library either spawn threads or processes under their feet, which in Python has a tendency to get nasty fast (I for one have experienced the joy of having to ctrl+c multiple times to get a program using paramiko to actually die). Again, it’s worth noting that this change will not just affect pip but also the millions of Python 2 applications using Requests. I am ok with giving those users access to only part of the functionality that the Python 3 users get, but I’m not ok with that smaller part also being objectively worse than what we do today. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 12:01:41 +0100 Cory Benfield wrote: > In principle, sure. In practice, that means most of our users don’t use those > features and so we don’t get any feedback on whether they’re good solutions > to the problem. On bugs.python.org we get plenty of feedback from people using Python 3's features, and we have been for years. Your concern would have been very valid in the Python 3.2 timeframe, but I don't think it is anymore. > All of this is related. I wrote a very, very long email initially and deleted > it all because it was just too long to expect any normal human being to read > it, but the TL;DR here is that we also want to support async/await, and doing > so requires a memory BIO object. async/await doesn't require a memory BIO object. For example, Tornado supports async/await (*) even though it doesn't use a memory BIO object for its SSL layer. And asyncio started with a non-memory BIO SSL implementation while still using "yield from". (*) Despite the fact that Tornado's own coroutines are yield-based generators. > As to Tornado, the biggest concern there is that there is no support for > composing the TLS over non-TCP sockets as far as I am aware. The wrapped > socket approach is not suitable for some kinds of stream-based I/O that users > really should be able to use with Requests (e.g. UNIX pipes). Hmm, why would you use TLS on UNIX pipes except as an academic experiment? Tornado is far from a full-fledged networking package like Twisted, but its HTTP(S) support should be very sufficient (understandably, since it is the core use case for it). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 12:20, Antoine Pitrou wrote: > > On Thu, 1 Jun 2017 12:01:41 +0100 > Cory Benfield wrote: >> In principle, sure. In practice, that means most of our users don’t use >> those features and so we don’t get any feedback on whether they’re good >> solutions to the problem. > > On bugs.python.org we get plenty of feedback from people using Python > 3's features, and we have been for years. > > Your concern would have been very valid in the Python 3.2 timeframe, > but I don't think it is anymore. Ok? I guess? I don’t know what to do with that answer, really. I gave you some data (80%+ of requests downloads over the last month were Python 2), and you responded with “it doesn’t cause us problems”. That’s good for you, I suppose, and well done, but it doesn’t seem immediately applicable to the concern I have. >> All of this is related. I wrote a very, very long email initially and >> deleted it all because it was just too long to expect any normal human being >> to read it, but the TL;DR here is that we also want to support async/await, >> and doing so requires a memory BIO object. > > async/await doesn't require a memory BIO object. For example, Tornado > supports async/await (*) even though it doesn't use a memory BIO object > for its SSL layer. And asyncio started with a non-memory BIO SSL > implementation while still using "yield from". > > (*) Despite the fact that Tornado's own coroutines are yield-based > generators. You are right, sorry. I should not have used the word “require”. Allow me to rephrase. MemoryBIO objects are vastly, vastly more predictable and tractable than wrapped sockets when combined with non-blocking I/O. Using wrapped sockets and select/poll/epoll/kqueue, while possible, requires extremely subtle code that is easy to get wrong, and can nonetheless still have awkward bugs in it. I would be extremely loathe to use such an implementation, but you are correct, such an implementation can exist. >> As to Tornado, the biggest concern there is that there is no support for >> composing the TLS over non-TCP sockets as far as I am aware. The wrapped >> socket approach is not suitable for some kinds of stream-based I/O that >> users really should be able to use with Requests (e.g. UNIX pipes). > > Hmm, why would you use TLS on UNIX pipes except as an academic > experiment? Tornado is far from a full-fledged networking package like > Twisted, but its HTTP(S) support should be very sufficient > (understandably, since it is the core use case for it). Let me be clear that there is no intention to use either Tornado or Twisted’s HTTP/1.1 parsers or engines. With all due respect to both projects, I have concerns about both their client implementations. Tornado’s default is definitely not suitable for use in Requests, and the curl backend is but, surprise surprise, requires a C extension and oh god we’re back here again. I have similar concerns about Twisted’s default HTTP/1.1 client. Tornado’s HTTP/1.1 server is certainly sufficient, but also not of much use to Requests. Requests very much intends to use our own HTTP logic, not least because we’re sick of relying on someone else’s. Literally what we want is to have an event loop backing us that we can integrate with async/await and that requires us to reinvent as few wheels as possible while giving an overall better end-user experience. If I were to use Tornado, because I would want to integrate PEP 543 support into Tornado I’d ultimately have to rewrite Tornado’s TLS implementation *anyway* to replace it with a PEP 543 version. If I’m doing that, I’d much rather do it with MemoryBIO than wrapped sockets, for all of the reasons above. As a final note, because I think we’re getting into the weeds here: this is not *necessary*. None of this is *necessary*. Requests exists, and works today. We’ll get Windows TLS support regardless of anything that’s done here, because I’ll just shim it into urllib3 like we did for macOS. What I am pushing for with PEP 543 is an improvement that would benefit the whole ecosystem: all I want to do is to make it possible for me to actually use it and ship it to users in the tools I maintain. It is reasonable and coherent for python-dev to say “well, good luck, but no backports to help you out”. The result of that is that I put PEP 543 on the backburner (because it doesn’t solve Requests/urllib3’s problems, and ultimately my day job is about resolving those issues), and probably that we shutter the async discussion for Requests until we drop Python 2 support. That’s fine, Python is your project, not mine. But I don’t see that there’s any reason for us not to ask for this backport. After all, the worst you can do is say no, and my problems remain the same. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/ma
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
Le 01/06/2017 à 15:12, Cory Benfield a écrit : > > I don’t know what to do with that answer, really. I gave you some data (80%+ > of requests downloads over the last month were Python 2), and you responded > with “it doesn’t cause us problems”. And indeed it doesn't. Unless the target user base for pip is widely different than Python's, it shouldn't cause you any problems either. > As a final note, because I think we’re getting into the weeds here: this is > not *necessary*. None of this is *necessary*. Requests exists, and works > today. And pip could even bundle a frozen 2.7-compatible version of Requests if it wanted/needed to... > Let me be clear that there is no intention to use either Tornado or Twisted’s HTTP/1.1 parsers or engines. [...] Requests very much intends to use our own HTTP logic, not least because we’re sick of relying on someone else’s. Then the PEP is really wrong or misleading in the way it states its own motivations. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 14:21, Antoine Pitrou wrote: > > > Le 01/06/2017 à 15:12, Cory Benfield a écrit : >> >> I don’t know what to do with that answer, really. I gave you some data (80%+ >> of requests downloads over the last month were Python 2), and you responded >> with “it doesn’t cause us problems”. > > And indeed it doesn't. Unless the target user base for pip is widely > different than Python's, it shouldn't cause you any problems either. Maybe not now, but I think it’s fair to say that it did, right? As I recall, Python spent a long time with two fully supported Python versions, and then an even longer time with a version that was getting bugfixes. Tell me, which did you get more feedback on during that time? Generally speaking it is fair to say that at this point *every line of code in Requests* is exercised or depended on by one of our users. If we write new code available to a small fraction of them, and it is in any way sizeable, then that stops being true. Again, we should look at the fact that most libraries that successfully support Python 2 and Python 3 do so through having codebases that share as much code as possible between the two implementations. Each line of code that is exercised in only one implementation becomes a vector for a long, lingering bug. Anyway, all I know is that the last big project to do this kind of hard cut was Python, and while many of us are glad that Python 3 is real and glad that we pushed through the pain, I don’t think anyone would argue that the move was painless. A lesson can be learned there, especially for Requests which is not currently nursing a problem as fundamental to it as Python was. >> As a final note, because I think we’re getting into the weeds here: this is >> not *necessary*. None of this is *necessary*. Requests exists, and works >> today. > > And pip could even bundle a frozen 2.7-compatible version of Requests if > it wanted/needed to… Sure, if pip wants to internalise supporting and maintaining that version. One of the advantages of the pip/Requests relationship is that pip gets to stop worrying about HTTP: if there’s a HTTP problem, that’s on someone else to fix. Bundling that would remove that advantage. > >> Let me be clear that there is no intention to use either Tornado or > Twisted’s HTTP/1.1 parsers or engines. [...] Requests very much intends > to use our own HTTP logic, not least because we’re sick of relying on > someone else’s. > > Then the PEP is really wrong or misleading in the way it states its own > motivations. How so? TLS is not a part of the HTTP parser. It’s an intermediary layer between the transport (resolutely owned by the network layer in Twisted/Tornado) and the parsing layer (resolutely owned by Requests). Ideally we would not roll our own. Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 14:37:55 +0100 Cory Benfield wrote: > > > > And indeed it doesn't. Unless the target user base for pip is widely > > different than Python's, it shouldn't cause you any problems either. > > Maybe not now, but I think it’s fair to say that it did, right? Until Python 3.2 and perhaps 3.3, yes. Since 3.4, definitely not. For example asyncio quickly grew a sizable community around it, even though it had established Python 2-compatible competitors. > > Then the PEP is really wrong or misleading in the way it states its own > > motivations. > > How so? In the sentence "There are plans afoot to look at moving Requests to a more event-loop-y model, and doing so basically mandates a MemoryBIO", and also in the general feeling it gives that the backport is motivated by security reasons primarily. I understand that some users would like more features in Python 2.7. That has been the case since it was decided that feature development in the 2.x line would end in favour of Python 3 development. But our maintenance policy has been and is to develop new features on Python 3 (which some people have described as a "carrot" for migrating, which is certainly true). Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 14:53, Antoine Pitrou wrote: > > On Thu, 1 Jun 2017 14:37:55 +0100 > Cory Benfield wrote: >>> >>> And indeed it doesn't. Unless the target user base for pip is widely >>> different than Python's, it shouldn't cause you any problems either. >> >> Maybe not now, but I think it’s fair to say that it did, right? > > Until Python 3.2 and perhaps 3.3, yes. Since 3.4, definitely not. For > example asyncio quickly grew a sizable community around it, even though > it had established Python 2-compatible competitors. Sure, but “until 3.2” covers a long enough time to take us from now to “deprecation of Python 2”. Given that the Requests team is 4 people, unlike python-dev’s much larger number, I suspect we’d have at least as much pain proportionally as Python did. I’m not wild about signing up for that. >>> Then the PEP is really wrong or misleading in the way it states its own >>> motivations. >> >> How so? > > In the sentence "There are plans afoot to look at moving Requests to a > more event-loop-y model, and doing so basically mandates a MemoryBIO", > and also in the general feeling it gives that the backport is motivated > by security reasons primarily. Ok, let’s address those together. There are security reasons to do the backport, but they are “it helps us build a pathway to PEP 543”. Right now there are a lot of people interested in seeing PEP 543 happen, but vastly fewer in a position to do the work. I am, but only if I can actually use it for the things that are in my job. If I can’t, then PEP 543 becomes an “evenings and weekends” activity for me *at best*, and something I have to drop entirely at worst. Adopting PEP 543 *would* be a security benefit, so while this PEP itself is not directly in and of itself a security benefit, it builds a pathway to something that is. As to the plans to move Requests to a more event loop-y model, I think that it does stand in the way of this, but only insomuch as, again, we want our event loopy model to be as bug-free as possible. But I can concede that rewording on that point would be valuable. *However*, it’s my understanding that even if I did that rewording, you’d still be against it. Is that correct? Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, Jun 01, 2017 at 12:18:48PM +0100, Cory Benfield wrote: > So, this will work, but at a performance and code cleanliness cost. > This essentially becomes a Python-2-only code-path, and a very large > and complex one at that. "Doctor, it hurts when I do this .." Fine, then how about rather than exporting pip's problems on to the rest of the world (which an API change to a security module in a stable branch most certainly is, 18 months later when every other Python library starts depending on it), just drop SSL entirely, it will almost certainly cost less pain in the long run, and you can even arrange for the same code to run in both major versions. Drop SSL? But that's madness! Serve the assets over plain HTTP and tack a signature somewhere alongside it, either side-by-side in a file, embedded in a URL query string, or whatever. Here[0] is 1000 lines of pure Python that can validate a public key signature over a hash of the asset as it's downloaded. Embed the 32 byte public key in the pip source and hey presto. [0] https://github.com/jfindlay/pure_pynacl/blob/master/pure_pynacl/tweetnacl.py Finding someone to audit the signature checking capabilities of [0] will have vastly lower net cost than getting the world into a situation where pip no longer runs on the >1e6 EC2 instances that will be running Ubuntu 14.04/16.04 LTS until the turn of the next decade. Requests can't be installed without a working SSL implementation? Then drop requests, it's not like it does much for pip anyway. Downloads worldwide get a huge speedup due to lack of TLS handshake latency, a million Squid caching reverse proxies worldwide jump into action caching tarballs they previously couldn't see, pip's _vendor directory drops by 4.2MB, and Python package security depends on 1k lines of memory-safe code rather than possibly *the* worst example of security-unconcious C to come into existence since the birth of our industry. Sounds like a win to me. Maybe set a standard rather than blindly follow everyone else, at the cost of.. everyone else. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
Trying to transfer github comments from https://github.com/python/peps/pull/272#pullrequestreview-41388700: I said: > Tornado has been doing TLS in an event-loop model in python 2.5+ with just wrap_socket, no MemoryBIO necessary. What am I missing? MemoryBIO certainly gives some extra flexibility, but nothing I can see that's strictly required for an HTTP client. (Maybe it comes up in some proxy scenarios that Tornado hasn't implemented?) There were three main responses: - MemoryBIO is necessary to support TLS on windows with IOCP. Tornado's approach requires the less-efficient select() interface. This is valid and IMHO the biggest argument against using Tornado instead of Twisted in requests. Even if requests is willing to accept the limitation of not being able to use IOCP on Python 2, it may be tricky to arrange things so it can support both Tornado's select-based event loop on Python 2 and the IOCP-based interfaces in Python 3's asyncio (I'd volunteer to help with this if the requests team is interested in pursuing it, though). - wrap_socket is difficult to use correctly with an event loop; Twisted was happy to move away from it to the MemoryBIO model. My response: MemoryBIO is certainly a *better* solution for this problem, but it's not a *requirement*. Twisted prefers to do as little buffering as possible, which contributes to the difficulty of using wrap_socket. The buffering in Tornado's SSLIOStream simplifies this. Glyph reports that there are still some difficult-to-reproduce bugs; that may be but I haven't heard any other reports of this. I believe that whatever bugs might remain in this area are resolvable. - MemoryBIO supports a wider variety of transports, including pipes. There's a question about unix domain sockets - Tornado supports these generally but I haven't tried them with TLS. I would expect it to work. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On Thursday 01 June 2017 01:53 PM, INADA Naoki wrote: > * On Linux, madvice(..., MADV_DONTNEED) can be used. madvise does not reduce the commit charge in the Linux kernel, so in high consumption scenarios (and where memory overcommit is disabled or throttled) you'll see programs dying with OOM despite the MADV_DONTNEED. The way we solved it in glibc was to use mprotect to drop PROT_READ and PROT_WRITE in blocks that we don't need when we detect that the system is not configured to overcommit (using /proc/sys/vm/overcommit_memory). You'll need to fix the protection again though if you want to reuse the block. Siddhesh ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
It seems very complex and not portable at all to "free" a part of an arena. We already support freeing a whole arena using munmap(). I was a huge enhancement in our memory allocator. Change made in Python 2.5? I don't recall, ask Evan Jones: http://www.evanjones.ca/memoryallocator/ :-) I'm not sure that it's worth it to increase the arena size and try to implement the MADV_DONTNEED / MADV_FREE thing. Victor 2017-06-01 11:21 GMT+02:00 INADA Naoki : > Thanks for detailed info. > > But I don't think it's a big problem. > Arenas are returned to system by chance. So other processes > shouldn't relying to it. > > And I don't propose to stop returning arena to system. > I just mean per pool (part of arena) MADV_DONTNEED can reduce RSS. > > If we use very large arena, or stop returning arena to system, > it can be problem. > > Regards, > > On Thu, Jun 1, 2017 at 6:05 PM, Siddhesh Poyarekar > wrote: >> On Thursday 01 June 2017 01:53 PM, INADA Naoki wrote: >>> * On Linux, madvice(..., MADV_DONTNEED) can be used. >> >> madvise does not reduce the commit charge in the Linux kernel, so in >> high consumption scenarios (and where memory overcommit is disabled or >> throttled) you'll see programs dying with OOM despite the MADV_DONTNEED. >> The way we solved it in glibc was to use mprotect to drop PROT_READ and >> PROT_WRITE in blocks that we don't need when we detect that the system >> is not configured to overcommit (using /proc/sys/vm/overcommit_memory). >> You'll need to fix the protection again though if you want to reuse the >> block. >> >> Siddhesh > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 15:10, David Wilson wrote: > Finding someone to audit the signature checking capabilities of [0] will > have vastly lower net cost than getting the world into a situation where > pip no longer runs on the >1e6 EC2 instances that will be running Ubuntu > 14.04/16.04 LTS until the turn of the next decade. So for the record I’m assuming most of the previous email was a joke: certainly it’s not going to happen. ;) But this is a real concern that does need to be addressed: Requests can’t meaningfully use this as its only TLS backend until it propagates to the wider 2.7 ecosystem, at least far enough such that pip can drop Python 2.7 releases lower than 2.7.14 (or wherever MemoryBIO ends up, if backported). So a concern emerges: if you grant my other premises about the utility of the backport, is it worth backporting at all? The answer to that is honestly not clear to me. I chatted with the pip developers, and they have 90%+ of their users currently on Python 2, but more than half of those are on 2.7.9 or later. This shows some interest in upgrading to newer Python 2s. The question, I think, is: do we end up in a position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2? I don’t have an answer to that question. I have a gut instinct that says yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”. But I’d say that from my perspective this is the crux of the problem. We can hedge against this by just choosing to backport and accepting that it may never become useful, but a reasonable person can disagree and say that it’s just not worth the effort. Frankly, I think that amidst all the other arguments this is the one that most concretely needs answering, because if we don’t think Requests can ever meaningfully rely on the presence of MemoryBIO on 2.7 (where “rely on” can be approximated to 90%+ of 2.7 users having access to it AND 2.7 still having non-trivial usage numbers) then ultimately this PEP doesn’t grant me much benefit. There are others who believe there are a few other benefits we could get from it (helping out Twisted etc.), but I don’t know that I’m well placed to make those arguments. (I also suspect I’d get accused of moving the goalposts.) Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, 1 Jun 2017 15:09:41 +0100 Cory Benfield wrote: > > As to the plans to move Requests to a more event loop-y model, I think that > it does stand in the way of this, but only insomuch as, again, we want our > event loopy model to be as bug-free as possible. But I can concede that > rewording on that point would be valuable. > > *However*, it’s my understanding that even if I did that rewording, > you’d still be against it. Is that correct? Yes. It's just that it would more fairly inform the people reading it. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield wrote: > The answer to that is honestly not clear to me. I chatted with the pip > developers, and they have 90%+ of their users currently on Python 2, but more > than half of those are on 2.7.9 or later. This shows some interest in > upgrading to newer Python 2s. The question, I think, is: do we end up in a > position where a good number of developers are on 2.7.14 or later and only a > very small fraction on 2.7.13 or earlier before the absolute number of Python > 2 devs drops low enough to just drop Python 2? > > I don’t have an answer to that question. I have a gut instinct that says yes, > probably, but a lack of certainty. My suspicion is that most of the core dev > community believe the answer to that is “no”. > Let's see. Python 2 users include people on Windows who install it themselves, and then have no mechanism for automatic updates. They'll probably stay on whatever 2.7.x they first got, until something forces them to update. But it also includes people on stable Linux distros, where they have automatic updates provided by Red Hat or Debian or whomever, so a change like this WILL propagate - particularly (a) as the window is three entire years, and (b) if the change is considered important by the distro managers, which is a smaller group of people to convince than the users themselves. By 2020, Windows 7 will be out of support. By various estimates, Win 7 represents roughly half of all current Windows users. That means that, by 2020, at least half of today's Windows users will either have upgraded to a new OS (likely with a wipe-and-fresh-install, so they'll get a newer Python), or be on an unsupported OS, on par with people still running XP today. The same is true for probably close to 100% of Linux users, since any supported Linux distro will be shipping updates between now and 2020, and I don't know much about Mac OS updates, but I rather suspect that they'll also be updating. (Can anyone confirm?) So I'd be in the "yes" category. Across the next few years, I strongly suspect that 2.7.14 will propagate reasonably well. And I also strongly suspect that, even once 2020 hits and Python 2 stops getting updates, it will still be important to a lot of people. These numbers aren't backed by much, but it's slightly better than mere gut instinct. Do you have figures for how many people use pip on Windows vs Linux vs Mac OS? ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On 1 Jun 2017, at 17:14, Chris Angelico wrote: > > > Do you have figures for how many people use pip on Windows vs Linux vs Mac OS? I have figures for the download numbers, which are an awkward proxy because most people don’t CI on Windows and macOS, but they’re the best we have. Linux has approximately 20x the download numbers of either Windows or macOS, and both Windows and macOS are pretty close together. These numbers are a bit confounded due to the fact that 1/4 of Linux’s downloads are made up of systems that don’t report their platform, so the actual ratio could be anywhere from about 25:1 to 3:1 in favour of Linux for either Windows or macOS. All of this is based on the downloads made in the last month. Again, an enormous number of these downloads are going to be CI downloads which overwhelmingly favour Linux systems. For some extra perspective, the next highest platform by download count is FreeBSD, with 0.04% of the downloads of Linux. HTH, Cory ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Fri, Jun 2, 2017 at 2:35 AM, Cory Benfield wrote: > I have figures for the download numbers, which are an awkward proxy because > most people don’t CI on Windows and macOS, but they’re the best we have. > Linux has approximately 20x the download numbers of either Windows or macOS, > and both Windows and macOS are pretty close together. These numbers are a bit > confounded due to the fact that 1/4 of Linux’s downloads are made up of > systems that don’t report their platform, so the actual ratio could be > anywhere from about 25:1 to 3:1 in favour of Linux for either Windows or > macOS. All of this is based on the downloads made in the last month. > > Again, an enormous number of these downloads are going to be CI downloads > which overwhelmingly favour Linux systems. Hmm. So it's really hard to know. Pity. I suppose it's too much to ask for IP-based stat exclusion for the most commonly-used CI systems (Travis, Circle, etc)? Still, it does look like most pip usage happens on Linux. Also, it seems likely that the people who use Python and pip heavily are going to be the ones who most care about keeping up-to-date with point releases, so I still stand by my belief that yes, 2.7.14+ could take the bulk of 2.7's marketshare before 2.7 itself stops being significant. Thanks for the figures. ChrisA ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
2017-06-01 18:51 GMT+02:00 Chris Angelico : > Hmm. So it's really hard to know. Pity. I suppose it's too much to ask > for IP-based stat exclusion for the most commonly-used CI systems > (Travis, Circle, etc)? Still, it does look like most pip usage happens > on Linux. Also, it seems likely that the people who use Python and pip > heavily are going to be the ones who most care about keeping > up-to-date with point releases, so I still stand by my belief that > yes, 2.7.14+ could take the bulk of 2.7's marketshare before 2.7 > itself stops being significant. It sems like PyPI statistics are public: https://langui.sh/2016/12/09/data-driven-decisions/ Another article on PyPI stats: https://hynek.me/articles/python3-2016/ 2.7: 419 millions (89%) 3.3+3.4+3.5+3.6: 51 millions (11%) (I ignored 2.6) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Jun 02, 2017, at 02:14 AM, Chris Angelico wrote: >But it also includes people on stable Linux distros, where they have >automatic updates provided by Red Hat or Debian or whomever, so a change like >this WILL propagate - particularly (a) as the window is three entire years, >and (b) if the change is considered important by the distro managers, which >is a smaller group of people to convince than the users themselves. [...] >So I'd be in the "yes" category. Across the next few years, I strongly >suspect that 2.7.14 will propagate reasonably well. I'm not so sure about that, given long term support releases. For Ubuntu, LTS releases live for 5 years: https://www.ubuntu.com/info/release-end-of-life By 2020, only Ubuntu 16.04 and 18.04 will still be maintained, so while 18.04 will likely contain whatever the latest 2.7 is available at that time, 16.04 won't track upstream point releases, but instead will get select cherry picks. For good reason, there's a lot of overhead to backporting fixes into stable releases, and something as big as being suggested here would, in my best guess, have a very low chance of showing up in stable releases. -Barry ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Jun 1, 2017 9:20 AM, "Chris Angelico" wrote: On Fri, Jun 2, 2017 at 1:01 AM, Cory Benfield wrote: > The answer to that is honestly not clear to me. I chatted with the pip developers, and they have 90%+ of their users currently on Python 2, but more than half of those are on 2.7.9 or later. This shows some interest in upgrading to newer Python 2s. The question, I think, is: do we end up in a position where a good number of developers are on 2.7.14 or later and only a very small fraction on 2.7.13 or earlier before the absolute number of Python 2 devs drops low enough to just drop Python 2? > > I don’t have an answer to that question. I have a gut instinct that says yes, probably, but a lack of certainty. My suspicion is that most of the core dev community believe the answer to that is “no”. > Let's see. Python 2 users include people on Windows who install it themselves, and then have no mechanism for automatic updates. They'll probably stay on whatever 2.7.x they first got, until something forces them to update. But it also includes people on stable Linux distros, where they have automatic updates provided by Red Hat or Debian or whomever, so a change like this WILL propagate - particularly (a) as the window is three entire years, and (b) if the change is considered important by the distro managers, which is a smaller group of people to convince than the users themselves. I believe that for answering this question about the ssl module, it's really only Linux users that matter, since pip/requests/everyone else pushing for this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS (IIUC) is to stop using the ssl module entirely in favor of new ctypes bindings for their respective native TLS libraries. (And yes, in principle it might be possible to write new ctypes-based bindings for openssl, but (a) this whole project is already teetering on the verge of being impossible given the resources available, so adding any major extra deliverable is likely to sink the whole thing, and (b) compared to the proprietary libraries, openssl is *much* harder and riskier to wrap at the ctypes level because it has different/incompatible ABIs depending on its micro version and the vendor who distributed it. This is why manylinux packages that need openssl have to ship their own, but pip can't and shouldn't ship its own openssl for many hopefully obvious reasons.) -n ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
2017-06-01 19:09 GMT+02:00 Barry Warsaw : > By 2020, only Ubuntu 16.04 and 18.04 will still be maintained, so while 18.04 > will likely contain whatever the latest 2.7 is available at that time, 16.04 > won't track upstream point releases, but instead will get select cherry > picks. For good reason, there's a lot of overhead to backporting fixes into > stable releases, and something as big as being suggested here would, in my > best guess, have a very low chance of showing up in stable releases. I can help Canonical to backport MemoryBIO *if they want* to cherry-pick this feature ;-) Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
2017-06-01 19:10 GMT+02:00 Nathaniel Smith : > (...) since pip/requests/everyone else pushing for > this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS > (IIUC) is to stop using the ssl module entirely in favor of new ctypes > bindings for their respective native TLS libraries. The long term plan include one Windows implementation, one macOS implementation and one implementation using the stdlib ssl module. But it seems like right now, Cory is working alone and has limited time to implement his PEP 543 (new TLS API). The short term plans is to implement the strict minimum implementation, the one relying on the existing stdlib ssl module. Backporting MemoryBIO makes it possible to get the new TLS API "for free" on Python 2.7. IMHO Python 2.7 support is a requirement to make the PEP popular enough to make it successful. The backport is supposed to fix a chicken-and-egg issue :-) > (And yes, in principle it might be possible to write new ctypes-based > bindings for openssl, but (...)) A C extension can also be considered, but I trust more code in CPython stdlib, since it would be well tested by our big farm of buildbots and have more eyes looking to the code. -- It seems like the PEP 546 (backport MemoryBIO) should make it more explicit that MemoryBIO support will be "optional": it's ok if Jython or PyPy doesn't implement it. It's ok if old Python 2.7 versions don't implement it. I expect anyway to use a fallback for those. It's just that I would prefer to avoid a fallback (likely a C extension) whenever possible, since it would cause various issues, especially for C code using OpenSSL: OpenSSL API changed many times :-/ Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On 06/01/2017 02:03 AM, Victor Stinner wrote: 2017-06-01 10:41 GMT+02:00 Larry Hastings : On 06/01/2017 01:19 AM, Antoine Pitrou wrote: If you'd like to go that way anyway, I would suggest 1MB as a starting point in 3.7. I understand the desire for caution. But I was hoping maybe we could experiment with 4mb in trunk for a while? We could change it to 1mb--or even 256k--before beta 1 if we get anxious. While I fail to explain why in depth, I would prefer to *not* touch the default arena size at this point. We need more data, for example measure the memory usage on different workloads using different arena sizes. I can't argue with collecting data at this point in the process. My thesis is simply "the correct value for this tunable parameter in 2001 is probably not the same value in 2017". I don't mind proceeding *slowly* or gathering more data or what have you for now. But I would like to see it change somehow between now and 3.7.0b1, because my sense is that we can get some performance for basically free by updating the value. If ARENA_SIZE tracked Moore's Law, meaning that we doubled it every 18 months like clockwork, it'd currently be 2**10 times bigger: 256MB, and we'd be changing it to 512MB at the end of August. (And yes, as a high school student I was once bitten by a radioactive optimizer, so these days when I'm near possible optimizations my spider-sense--uh, I mean, my optimization-sense--starts tingling.) A simple enhancement would be to add an environment variable to change the arena size at Python startup. Example: PYTHONARENASIZE=1M. Implementing this would slow down address_in_range which currently compiles in arena size. It'd be by a tiny amount, but this inline function gets called very very frequently. It's possible this wouldn't hurt performance, but my guess is it'd offset the gains we got from larger arenas, and the net result would be no faster or slightly slower. //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Jun 01, 2017, at 07:22 PM, Victor Stinner wrote: >I can help Canonical to backport MemoryBIO *if they want* to >cherry-pick this feature ;-) (Pedantically speaking, this falls under the Ubuntu project's responsibility, not directly Canonical.) Writing the patch is only part of the process: https://wiki.ubuntu.com/StableReleaseUpdates There's also Debian to consider. Cheers, -Barry pgpNTh6UFehHt.pgp Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
On 06/01/2017 02:20 AM, Victor Stinner wrote: I would like to understand how private free lists are "so much" faster. In fact, I don't recall if someone *measured* the performance speedup of these free lists :-) I have, recently, kind of by accident. When working on the Gilectomy I turned off some freelists as they were adding "needless" complexity and presumably weren't helping performance that much. Then I turned them back on because it turned out they really did help. My intuition is that the help in two major ways: * Since they're a known size, you don't need to go through the general-case code of looking up the right spot in usedpools (etc) to get one / put one back in malloc/free. * The code that recycles these objects assumes that objects from its freelist are already mostly initialized, so it doesn't need to initialize them. The really crazy one is PyFrameObjects. The global freelist for these stores up to 200 (I think) in a stack, implemented as a simple linked list. When CPython wants a new frame object, it takes the top one off the stack and uses it. Where it gets crazy is: PyFrameObjects are dynamically sized, based on the number of arguments + local variables + stack + freevars + cellvars. So the frame you pull off the free list might not be big enough. If it isn't big enough, the code calls *realloc* on it then uses it. This seems like such a weird approach to me. But it's obviously a successful approach, and I've learned not to argue with success. p.s. Speaking of freelists, at one point Serhiy had a patch adding a freelist for single- and I think two-digit ints. Right now the only int creation optimization we have is the array of constant "small ints"; if the int you're constructing isn't one of those, we use the normal slow allocation path with PyObject_Alloc etc. IIRC this patch made things faster. Serhiy, what happened to that patch? Was it actually a bad idea, or did it just get forgotten? //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On 1 June 2017 at 17:14, Chris Angelico wrote: > Python 2 users include people on Windows who install it themselves, > and then have no mechanism for automatic updates. They'll probably > stay on whatever 2.7.x they first got, until something forces them to > update. But it also includes people on stable Linux distros, where > they have automatic updates provided by Red Hat or Debian or whomever, > so a change like this WILL propagate - particularly (a) as the window > is three entire years, and (b) if the change is considered important > by the distro managers, which is a smaller group of people to convince > than the users themselves. However, it is trivial for Windows users to upgrade if asked to, as there's no issue around system packages depending on a particular version (or indeed, much of anything depending - 3rd party applications on Windows bundle their own Python, they don't use the globally installed one). So in principle, there should be no problem expecting Windows users to be on the latest version of 2.7.x. In fact, I suspect that the proportion of Windows users on Python 3 is noticeably higher than the proportion of Linux/Mac OS users on Python 3 (for the same reason). So this problem may overall be less pressing for Windows users. I have no evidence that isn't anecdotal to back this last assertion up, though. Linux users often use the OS-supplied Python, and so getting the distributions to upgrade, and to backport upgrades to old versions of their OS and (push those backports as required updates) is the route to get the bulk of the users there. Experience on pip seems to indicate this is unlikely to happen, in practice. Mac OS users who use the system Python are, as I understand it, stuck with a pretty broken version (I don't know if newer versions of the OS change that). But distributions like Macports are more common and more up to date. Apart from the Windows details, these are purely my impressions. > Do you have figures for how many people use pip on Windows vs Linux vs Mac OS? No. But we do get plenty of bug reports from Windows users, so I don't think there's any reason to assume it's particularly low (given the relative numbers of *python* users - in fact, it may be proportionately higher as Windows users don't have alternative options like yum). Paul ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On Thu, Jun 01, 2017 at 04:01:54PM +0100, Cory Benfield wrote: > > lower net cost than getting the world into a situation where pip no > > longer runs on the >1e6 EC2 instances that will be running Ubuntu > > 14.04/16.04 LTS until the turn of the next decade. > So for the record I’m assuming most of the previous email was a joke: > certainly it’s not going to happen. ;) > But this is a real concern that does need to be addressed Unfortunately it wasn't, but at least I'm glad to have accidentally made a valid point amidst the cloud of caffeine-fuelled irritability :/ Apologies for the previous post, it was hardly constructive. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On Jun 1, 2017, at 1:09 PM, Barry Warsaw wrote: > > On Jun 02, 2017, at 02:14 AM, Chris Angelico wrote: > >> But it also includes people on stable Linux distros, where they have >> automatic updates provided by Red Hat or Debian or whomever, so a change like >> this WILL propagate - particularly (a) as the window is three entire years, >> and (b) if the change is considered important by the distro managers, which >> is a smaller group of people to convince than the users themselves. > [...] >> So I'd be in the "yes" category. Across the next few years, I strongly >> suspect that 2.7.14 will propagate reasonably well. > > I'm not so sure about that, given long term support releases. For Ubuntu, LTS > releases live for 5 years: > > https://www.ubuntu.com/info/release-end-of-life > > By 2020, only Ubuntu 16.04 and 18.04 will still be maintained, so while 18.04 > will likely contain whatever the latest 2.7 is available at that time, 16.04 > won't track upstream point releases, but instead will get select cherry > picks. For good reason, there's a lot of overhead to backporting fixes into > stable releases, and something as big as being suggested here would, in my > best guess, have a very low chance of showing up in stable releases. > Using 2.7.9 as a sort of benchmark here, currently 26% of downloads from PyPI are using a version of Python older than 2.7.9, 2 months ago that number was 31%. (That’s total across all Python versions). Python >= 2.7.9, <3 is at 43% (previously 53%). So in ~2.5 years 2.7.9+ has become > 50% of all downloads from PyPI while older versions of Python 2.7 are down to only ~25% of the total number of downloads made by pip. I was also curious about how this had changed over the past year instead of just the past two months, a year ago >=2.7,<2.7.9 accounted for almost 50% of all downloads from PyPI (compared to the 25% today). It *looks* like on average we’re dropping somewhere between 1.5% and 2% each month so a conservative estimate if these numbers hold, we’re be looking at single digit numbers for >=2.7,<2.7.9 in roughly 11 months, or 3.5 years after the release of 2.7.9. If we assume that the hypothetical 2.7.14 w/ MemoryBio support would follow a similar adoption curve, we would expect to be able to mandate it for pip/etc in at a worst case scenario, 3-4 years after release. In addition to that, pip 9 comes with a new feature that makes it easier to sunset support for versions of Python without breaking the world [1]. The likely scenario is that while pip 9+ is increasing in share Python <2.7.14 will be decreasing, and that would mean that we could *likely* start mandating it earlier, maybe at the 2 year mark or so. [1] An astute reader might ask, why could you not use this same mechanism to simply move on to only supporting Python 3? It’s true we could do that, however as a rule we generally try to keep support for Pythons until the usage drops below some threshold, where that threshold varies based on how hard it is to continue supporting that version of Python and what the “win” is in terms of dropping it. Since we’re still at 90% of downloads from PyPI being done using Python 2, that suggests the threshold for Python 3.x is very far away and will extend beyond 2020 (I mean, we’re just *now* finally able to drop support for Python 2.6). In case it’s not obvious, I am very much in support of this PEP. — Donald Stufft ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On Jun 1, 2017, at 3:51 PM, Paul Moore wrote: > > Linux users often use the OS-supplied Python, and so getting the > distributions to upgrade, and to backport upgrades to old versions of > their OS and (push those backports as required updates) is the route > to get the bulk of the users there. Experience on pip seems to > indicate this is unlikely to happen, in practice. Mac OS users who use > the system Python are, as I understand it, stuck with a pretty broken > version (I don't know if newer versions of the OS change that). But > distributions like Macports are more common and more up to date. > Note that on macOS, within the next year macOS users using the system Python are going to be unable to talk to PyPI anyways (unless Apple does something here, which I think they will), but in either case, Apple was pretty good about upgrading to 2.7.9 (I think they had the first OS released that supported 2.7.9?). — Donald Stufft ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
> On Jun 1, 2017, at 3:57 PM, Donald Stufft wrote: > > Note that on macOS, within the next year macOS users using the system Python > are going to be unable to talk to PyPI anyways (unless Apple does something > here, which I think they will), but in either case, Apple was pretty good > about upgrading to 2.7.9 (I think they had the first OS released that > supported 2.7.9?). Forgot to mention that pip 10.0 will work around this, thus forcing macOS users to upgrade or be cut off. — Donald Stufft ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
01.06.17 21:44, Larry Hastings пише: p.s. Speaking of freelists, at one point Serhiy had a patch adding a freelist for single- and I think two-digit ints. Right now the only int creation optimization we have is the array of constant "small ints"; if the int you're constructing isn't one of those, we use the normal slow allocation path with PyObject_Alloc etc. IIRC this patch made things faster. Serhiy, what happened to that patch? Was it actually a bad idea, or did it just get forgotten? The issue [1] still is open. Patches neither applied nor rejected. They exposes the speed up in microbenchmarks, but it is not large. Up to 40% for iterating over enumerate() and 5-7% for hard integer computations like base85 encoding or spectral_norm benchmark. [1] https://bugs.python.org/issue25324 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7
On 01Jun2017 1010, Nathaniel Smith wrote: I believe that for answering this question about the ssl module, it's really only Linux users that matter, since pip/requests/everyone else pushing for this only want to use ssl.MemoryBIO on Linux. Their plan on Windows/MacOS (IIUC) is to stop using the ssl module entirely in favor of new ctypes bindings for their respective native TLS libraries. (And yes, in principle it might be possible to write new ctypes-based bindings for openssl, but (a) this whole project is already teetering on the verge of being impossible given the resources available, so adding any major extra deliverable is likely to sink the whole thing, and (b) compared to the proprietary libraries, openssl is *much* harder and riskier to wrap at the ctypes level because it has different/incompatible ABIs depending on its micro version and the vendor who distributed it. This is why manylinux packages that need openssl have to ship their own, but pip can't and shouldn't ship its own openssl for many hopefully obvious reasons.) How much of a stop-gap would it be (for Windows at least) to override OpenSSL's certificate validation with a call into the OS? This leaves most of the work with OpenSSL, but lets the OS say yes/no to the certificates based on its own configuration. For Windows, this is under 100 lines of C code in (probably) _ssl, and while I think an SChannel based approach is the better way to go long-term,[1] offering platform-specific certificate validation as the default in 2.7 is far more palatable than backporting new public API. I can't speak to whether there is an equivalent function for Mac (validate a certificate chain given the cert blob). Cheers, Steve [1]: though I've now spent hours looking at it and still have no idea how it's supposed to actually work... ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On Thursday 01 June 2017 01:27 PM, Victor Stinner wrote: > The GNU libc malloc uses a variable threshold to choose between sbrk() > (heap memory) or mmap(). It starts at 128 kB or 256 kB, and then is > adapted depending on the workload (I don't know how exactly). The threshold starts at 128K and increases whenever an mmap'd block is freed. For example, if the program allocates 2M (which is returned using mmap) and then frees that block, glibc malloc assumes that 2M blocks will be needed again and optimizes that allocation by increasing the threshold to 2M. This works well in practice for common programs but it has been known to cause issues in some cases, which is why there's MALLOC_MMAP_THRESHOLD_ to fix the threshold. > I already read that CPU support "large pages" between 2 MB and 1 GB, > instead of just 4 kB. Using large pages can have a significant impact > on performance. I don't know if we can do something to help the Linux > kernel to use large pages for our memory? I don't know neither how we > could do that :-) Maybe using mmap() closer to large pages will help > Linux to join them to build a big page? (Linux has something magic to > make applications use big pages transparently.) There's MAP_HUGETLB and friends for mmap flags, but it's generally better to just let the kernel do this for you transparently (using Transparent Huge Pages) by making sure that your arena allocations are either contiguous or big enough. Siddhesh ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] "Global freepool"
2017-06-01 22:16 GMT+02:00 Serhiy Storchaka : > The issue [1] still is open. Patches neither applied nor rejected. They > exposes the speed up in microbenchmarks, but it is not large. Up to 40% for > iterating over enumerate() and 5-7% for hard integer computations like > base85 encoding or spectral_norm benchmark. > > [1] https://bugs.python.org/issue25324 Hum, I think that the right issue is: http://bugs.python.org/issue24165 Victor ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 7 and braces { .... } on if
If you create an issue at github.com/python/peps and assign it to me I will get to it someday. :) On Thu, 1 Jun 2017 at 00:19 Victor Stinner wrote: > 2017-05-31 19:27 GMT+02:00 Guido van Rossum : > > I interpret the PEP (...) > > Right, the phrasing requires to "interpret" it :-) > > > (...) as saying that you should use braces everywhere but not > > to add them in code that you're not modifying otherwise. (I.e. don't go > on a > > brace-adding rampage.) If author and reviewer of a PR disagree I would go > > with "add braces" since that's clearly the PEP's preference. This is C > code. > > We should play it safe. > > Would someone be nice enough to try to rephrase the PEP 7 to explain > that? Just to avoid further boring discussion on the C coding style... > > Victor > ___ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org > ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 7 and braces { .... } on if
https://github.com/python/peps/pull/280/files On Jun 01, 2017, at 09:08 PM, Brett Cannon wrote: >If you create an issue at github.com/python/peps and assign it to me I will >get to it someday. :) pgpqhM6HQldC5.pgp Description: OpenPGP digital signature ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] 2017 Python Language Summit coverage -- Round 2
Hola python-dev, Thanks for all the positive feedback on the coverage (and the corrections/clarifications in the comments too)! There is, it seems, always more to do, but I do have three additional articles from the summit up now and should complete the coverage over the next week. The starting point is the overview article, here: https://lwn.net/Articles/723251/ which should now be free for anyone to see (and the first four articles too). LWN subscribers can see the content right away, but one week after they are published in the weekly edition, they become freely available for everyone. Until then, though, feel free to share the SubscriberLinks I am posting here. I have been asked about our policy on appropriate places to share SubscriberLinks; blogs, tweets, social media, mailing lists, etc. are all perfectly fine with us. The new articles are: Keeping Python competitive: https://lwn.net/Articles/723949/ or https://lwn.net/SubscriberLink/723949/56a392defaae995c/ Trio and the future of asynchronous execution in Python: https://lwn.net/Articles/724082/ or https://lwn.net/SubscriberLink/724082/43c399adca8006f0/ Python ssl module update: https://lwn.net/Articles/724209/ or https://lwn.net/SubscriberLink/724209/8460ca8b51c00634/ stay tuned sometime next week for the thrilling conclusion :) jake -- Jake Edge - LWN - j...@lwn.net - http://lwn.net ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The untuned tunable parameter ARENA_SIZE
On 06/01/2017 02:50 AM, Antoine Pitrou wrote: Another possible strategy is: allocate several arenas at once (using a larger mmap() call), and use MADV_DONTNEED to relinquish individual arenas. Thus adding a *fourth* layer of abstraction over memory we get from the OS? block -> pool -> arena -> "multi-arena" -> OS Y'know, this might actually make things faster. These multi-arenas could be the dynamically growing thing Victor wants to try. We allocate 16mb, then carve it up into arenas (however big those are), then next time allocate 32mb or what have you. Since the arenas remain a fixed size, we don't make the frequently-used code path (address_in_range) any slower. The code to deal with the multi-arenas would add a little complexity--to an admittedly already complex allocator implementation, but then what allocator isn't complex internally?--but it'd be an infrequent code path and I bet it'd be an improvement over simply calling malloc / mmap / VirtualAlloc. What do you think, Victor? And to think I started this reply ironically, //arry/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Aligning the packaging.python.org theme with the rest of the docs
On 30 May 2017 at 22:08, Antoine Pitrou wrote: > On Tue, 30 May 2017 21:49:19 +1000 > Nick Coghlan wrote: >> >> Here's an alternate wording for the README that would focus on those >> considerations without explicitly asking folks not to use the theme: >> >> "Note that when adopting this theme, you're also borrowing an element >> of the trust and credibility established by the CPython core >> developers over the years, as well as the legal credibility arising >> from their close association with the Python Software Foundation. > > The statement about "legal credibility" sounds wishy-washy and could > lure users into thinking that they're doing something illegal by > borrowing the theme. > > Also I'm not sure what is that "legal credibility" you're talking > about. If it's about the PSF license and the Python CLA then > better to voice that explicitly, IMO. It's probably better to just drop that clause and call the repository "cpython-docs-theme" rather than "psf-docs-theme". Explicitly affiliating the theme with the PSF made sense if we were reserving the right to seek trade dress protections in the future, but it sounds like folks are pretty solidly against that idea, so we can instead leave the PSF out of it entirely. >> That's fine, and you're welcome to do so for other Python community >> projects if you so choose, but please keep in mind that in doing so >> you're also choosing to become a co-steward of that collective trust >> :)" > > "Becoming a co-steward of that collective trust" sounds serious enough > (even though I don't understand what it means concretely), so why > the smiley? Mainly to convey that the situation isn't necessarily as profound as that wording might suggest. Rephrasing that part, and incorporating the amendment from above: "Note that when adopting this theme, you're also borrowing an element of the trust and credibility established by the CPython core developers over the years. That's fine, and you're welcome to do so for other Python community projects if you so choose, but please keep in mind that in doing so you're also choosing to accept some of the responsibility for maintaining that collective trust." Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com