[issue12236] Tkinter __version__ uses subversion substitution
New submission from Julian Taylor : ./Lib/lib-tk/Tkinter.py:33 has this svn keyword substitution: __version__ = "$Revision$" Due to the change to hg this field is not substituted and makes __version__ quite pointless. This affects the python 2.7.2rc1. -- components: Tkinter messages: 137455 nosy: jtaylor priority: normal severity: normal status: open title: Tkinter __version__ uses subversion substitution type: behavior versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue12236> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12236] Tkinter __version__ uses subversion substitution
Julian Taylor added the comment: matplotlib fails to build due to this with 2.7.2rc1 in ubuntu oneiric (but its seems simple to fix): https://launchpad.net/ubuntu/+source/matplotlib/1.0.1-2ubuntu1/+build/2535369 -- ___ Python tracker <http://bugs.python.org/issue12236> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12752] locale.normalize does not take unicode strings
New submission from Julian Taylor : using unicode strings for locale.normalize gives following traceback with python2.7: ~$ python2.7 -c 'import locale; locale.normalize(u"en_US")' Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.7/locale.py", line 358, in normalize fullname = localename.translate(_ascii_lower_map) TypeError: character mapping must return integer, None or unicode with python2.6 it works and it also works with non-unicode strings in 2.7 -- components: Unicode messages: 142118 nosy: jtaylor priority: normal severity: normal status: open title: locale.normalize does not take unicode strings versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue12752> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12752] locale.normalize does not take unicode strings
Julian Taylor added the comment: this is a regression introduced by fixing http://bugs.python.org/issue1813 This breaks some user code,. e.g. wx.Locale.GetCanonicalName returns unicode. Example bugs: https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/824734 https://bugs.launchpad.net/ubuntu/+source/playonlinux/+bug/825421 -- ___ Python tracker <http://bugs.python.org/issue12752> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26530] tracemalloc: add C API to manually track/untrack memory allocations
Julian Taylor added the comment: The api looks good to me. Works fine in numpy. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue26530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26530] tracemalloc: add C API to manually track/untrack memory allocations
Julian Taylor added the comment: I don't see any reason why not to. -- ___ Python tracker <http://bugs.python.org/issue26530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30054] Expose tracemalloc C API to track/untrack memory blocks
Julian Taylor added the comment: I am not sure if _PyTraceMalloc_GetTraceback really needs to be a public function. Exposing the tracing information should probably just go over python interfaces. -- ___ Python tracker <http://bugs.python.org/issue30054> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30054] Expose tracemalloc C API to track/untrack memory blocks
Julian Taylor added the comment: With this changeset it would: https://github.com/numpy/numpy/pull/8885 -- ___ Python tracker <http://bugs.python.org/issue30054> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)
New submission from Julian Taylor: Probably a case of 'don't do that' but reading lines in a compressed files in binary mode produces bytes with invalid newlines in encodings that where '\n' is encoded as something else: with lzma.open("test.xz", "wt", encoding="UTF-32-LE") as f: f.write('0 1 2\n3 4 5'); lzma.open("test.xz", "rb").readlines()[0].decode('UTF-32-LE') Fails with: UnicodeDecodeError: 'utf-32-le' codec can't decode byte 0x0a in position 20: truncated data as readlines() produces: b'0\x00\x00\x00 \x00\x00\x001\x00\x00\x00 \x00\x00\x002\x00\x00\x00\n' The last newline should be '\n'.encode('UTF-32-LE') == b'\n\x00\x00\x00' -- components: Library (Lib) messages: 291661 nosy: jtaylor priority: normal severity: normal status: open title: binary compressed file reading corrupts newlines (lzma, gzip, bz2) ___ Python tracker <http://bugs.python.org/issue30073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)
Julian Taylor added the comment: on second though not really worth an issue as it is a general problem of readline on binary streams. Sorry for the noise. -- stage: -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue30073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)
Julian Taylor added the comment: see also http://bugs.python.org/issue17083 -- ___ Python tracker <http://bugs.python.org/issue30073> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30150] raw debug allocators to not return malloc alignment
New submission from Julian Taylor: The debug raw allocator do not return the same alignment as malloc. See _PyMem_DebugRawAlloc: https://github.com/python/cpython/blob/master/Objects/obmalloc.c#L1873 The line return p + 2*SST adds 2 * sizeof(size_t) to the pointer returned by malloc. On for example x32 malloc returns 16 byte aligned memory but size_t is 4 bytes. This makes all memory returned by the debug allocators not aligned the what the system assumes on such platforms. -- components: Interpreter Core messages: 292187 nosy: jtaylor priority: normal severity: normal status: open title: raw debug allocators to not return malloc alignment versions: Python 2.7, Python 3.6, Python 3.7 ___ Python tracker <http://bugs.python.org/issue30150> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30150] raw debug allocators to not return malloc alignment
Julian Taylor added the comment: no in numpy it is just a case of using the wrong allocator in a certain spot, an issue that can be fixed in numpy. But it is also minor bug/documentation issue in Python itself. Alignment isn't very important for SIMD any more but there are architectures where alignment is still mandatory so numpy is sprinkled with asserts checking alignment which triggered on x32. It is a very minor issue as to my knowledge none of the platforms with alignment requirement has the properties of x32 and x32 doesn't actually care about alignment either. -- ___ Python tracker <http://bugs.python.org/issue30150> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30150] raw debug allocators to not return malloc alignment
Julian Taylor added the comment: The largest type is usually the long double. Its alignment ranges from 4 bytes (i386) to 16 bytes (sparc). So Py_MAX (sizeof (size_t), 8) should indeed do it. -- ___ Python tracker <http://bugs.python.org/issue30150> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21148] avoid memset in small tuple creation
New submission from Julian Taylor: attached a prototype patch that avoids the memset of ob_item in PyTuple_New which is not necessary for the BUILD_TUPLE bytecode and PyTuple_Pack as these overwrite every entry in ob_item anyway. This improves small tuple creation by about 5%. It does this by adding a new internal function that does not use the memset loop and wrapping that in PyTuple_New that does it. _Pack and ceval call the internal function. The patch still needs cleanup I don't know where the signature for ceval.c would best go. Does the internal function need to be hidden from the DSO? microbenchmark, compiled with gcc-4.8.2 on ubuntu 14.04 amd64, default configure options: import timeit print(min(timeit.repeat("(a,)", setup="a = 1; b = 1", repeat=5, number=10**7))) print(min(timeit.repeat("(a, b)", setup="a = 1; b = 1", repeat=5, number=10**7))) before: 0.45767 0.52926 after: 0.42652 0.50122 larger tuples do not profit much as the loading is more expensive in comparison. -- components: Interpreter Core files: avoid-memset.patch keywords: patch messages: 215461 nosy: jtaylor priority: normal severity: normal status: open title: avoid memset in small tuple creation type: performance versions: Python 3.5 Added file: http://bugs.python.org/file34715/avoid-memset.patch ___ Python tracker <http://bugs.python.org/issue21148> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21233] Add *Calloc functions to CPython memory allocation API
Julian Taylor added the comment: won't replacing _PyObject_GC_Malloc with a calloc cause Var objects (PyObject_NewVar) to be completely zeroed which I think they didn't before? Some numeric programs stuff a lot of data into var objects and could care about python suddenly setting them to zero when they don't need it. An example would be tinyarray. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue21233> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21233] Add *Calloc functions to CPython memory allocation API
Julian Taylor added the comment: I just tested it, PyObject_NewVar seems to use RawMalloc not the GC malloc so its probably fine. -- ___ Python tracker <http://bugs.python.org/issue21233> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21592] Make statistics.median run in linear time
Julian Taylor added the comment: in the case of the median you can archive similar performance to a multiselect by simply calling min([len(data) // 2 + 1]) for the second order statistic which you need for the averaging of even number of elements. maybe an interesting datapoint would be to compare with numpys selection algorithm which is a intromultiselect (implemented in C for native datattypes). It uses a standard median of 3 quickselect with a cutoff in recursion depth to median of median of group of 5. the multiselect is implemented using a sorted list of kth order statistics and reducing the search space for each kth by maintaining a stack of all visited pivots. E.g. if you search for 30 and 100, when during the search for 30 one has visited pivot 70 and 110, the search for 100 only needs to select in l[70:110]. The not particularly readable implementation is in: ./numpy/core/src/npysort/selection.c.src unfortunately for object types it currently falls back to quicksort so you can't directly compare performance with the pure python variants. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue21592> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21592] Make statistics.median run in linear time
Julian Taylor added the comment: for median alone a multiselect is probably overkill (thats why I mentioned the minimum trick) but a selection algorithm is useful on its own for all of python and then a multiselect should be considered. Of course that means it would need to be implemented in C like sorted() so you actually have a significant performance gain that makes adding a new python function worthwhile. Also just to save numpys honor, you are benchmarking python list -> numpy array conversion and not the actual selection in your script with the numpy comparison. The conversion is significantly slower than the selection itself. Also select2b is inplace while np.partition is out of place. Repeated inplace selection typically gets faster as the number of required swaps goes down and can even be constant in time if the requested value does not change. With that fixed numpy outperforms pypy by about a factor 2 (but pypys performance is indeed quite impressive as it is far more generic) -- ___ Python tracker <http://bugs.python.org/issue21592> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19308] Tools/gdb/libpython.py does not support GDB linked against Python 3
Julian Taylor added the comment: I tested the latest patch (python27-gdb_py3.patch) with ubuntu 13.10 gdb compiled against python3.3, while it fixes the syntax errors it does not fix the functionality. E.g. one gets this error on breakpoints: Python Exception There is no member named length.: Breakpoint 3, PyTuple_Size (op=) at ../Objects/tupleobject.c:127 and the objects are not printed in their string representation as they should be with the plugin. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue19308> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19308] Tools/gdb/libpython.py does not support GDB linked against Python 3
Julian Taylor added the comment: on further investigation I seem to have screwed up patching the files. Patching properly they do work. Sorry for the noise. -- ___ Python tracker <http://bugs.python.org/issue19308> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module
New submission from Julian Taylor: the pvariance and variance functions take the argument mu and xbar to pass the population and sample mean to avoid some recomputation. I assume the keyword arguments are different because the two means accepted are different, but the docstring does not indicate this directly. It just says mu or xbar is the mean of the data. The module documentation is a little clearer but only in the grey box right at the end. I would propose to change the docstring and module documentation to explicitly state that mu is the population mean and xbar is the population mean. E.g. - The optional argument mu, if given, should be the mean of the data. + The optional argument mu, if given, should be the population mean of the data. etc. -- messages: 209192 nosy: jtaylor priority: normal severity: normal status: open title: clarify meaning of xbar and mu in pvariance/variance of statistics module versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue20389> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module
Julian Taylor added the comment: xbar is the *sample* mean of course maybe with proper docstrings the two functions could also use the same keyword argument? -- ___ Python tracker <http://bugs.python.org/issue20389> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3158] Doctest fails to find doctests in extension modules
Julian Taylor added the comment: the patch seems to work for me in ipython. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue3158> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module
Changes by Julian Taylor : -- components: +Library (Lib) ___ Python tracker <http://bugs.python.org/issue20389> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16754] Incorrect shared library extension on linux
Julian Taylor added the comment: is SHLIB_SUFFIX=".so" really correct on mac? shared libraries have .dylib extension, loadable modules have .so (which would be EXT_SUFFIX?) e.g libpython itself uses .dylib. -- nosy: +jtaylor108 ___ Python tracker <http://bugs.python.org/issue16754> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16754] Incorrect shared library extension on linux
Julian Taylor added the comment: I'm going by what says in configure: # SHLIB_SUFFIX is the extension of shared libraries The extension of shared libraries on macos is .dylib in most cases (e.g libtool based libraries and as mentioned python itself) Maybe its just a documentation/naming issue. -- ___ Python tracker <http://bugs.python.org/issue16754> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16754] Incorrect shared library extension on linux
Julian Taylor added the comment: just to clarify its not any issue in python, python is working fine with .so The issue is just that theses variables tends to be used by other applications to figure out information on the system (like shared library extension, see numpy.distutils) You certainly could argue that these applications are broken by even needing this information, but a proper naming of the variables could help reduce confusion and wrong code. -- ___ Python tracker <http://bugs.python.org/issue16754> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17895] TemporaryFile name returns an integer in python3
New submission from Julian Taylor: sys.version_info(major=3, minor=3, micro=1, releaselevel='final', serial=0) In [3]: type(tempfile.TemporaryFile().name) Out[3]: builtins.int in python2 it returned a string, this is a somewhat pointless api change which breaks some third party code, e.g. numpy (https://github.com/numpy/numpy/issues/3302) -- components: Library (Lib) messages: 188305 nosy: jtaylor priority: normal severity: normal status: open title: TemporaryFile name returns an integer in python3 type: behavior versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue17895> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5309] distutils doesn't parallelize extension module compilation
Julian Taylor added the comment: very nice, thanks for adding this. coincidentally numpy added the same to numpy.distutils independently just a week later, though numpy also accepts an environment variable to set the number of jobs. This is useful for e.g. pip installations where one does not control the command line. Also an environment variable allows parallel jobs in environments where it is not guaranteed that the feature is available. E.g. you could just put it into your .bashrc and when building with 3.5 it will just work and 2.7 will not fail. Is the naming --parallel/j already fixed? I'll change the numpy options to the same name then. Please also add it to the release notes so the feature can be discovered easier. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue5309> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23601] use small object allocator for dict key storage
Julian Taylor added the comment: Large objects are just if size > 512: return malloc(size) there is no reason it should be slower. Also for large objects allocation speed does not matter as much. -- ___ Python tracker <http://bugs.python.org/issue23601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21148] avoid needless pointers initialization in small tuple creation
Julian Taylor added the comment: right at best its probably too insignificant to really be worthwhile, closing. -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue21148> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity
Julian Taylor added the comment: any comments on the doc changes? -- ___ Python tracker <http://bugs.python.org/issue23530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23601] use small object allocator for dict key storage
Julian Taylor added the comment: Your benchmarks are not affected by this change see the other issue. They are also not representative of every workload out there. I can at least see the argument why you didn't want to put the other variant of this change in as it made the code a tiny bit more complicated, but I do not understand the reluctance for this variant. It doesn't change the complexity of the code one bit. If you doubt the performance of pythons own small object allocator, python should maybe stop using it alltogether? -- ___ Python tracker <http://bugs.python.org/issue23601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23601] use small object allocator for dict key storage
Julian Taylor added the comment: ok I ran it again, but note the machine was under use the full time so the results are likely have no meaning. python perf.py -r -b default /tmp/normal/bin/python3 /tmp/opt/bin/python3 Min: 0.399279 -> 0.376527: 1.06x faster Avg: 0.410819 -> 0.383315: 1.07x faster Significant (t=49.29) Stddev: 0.00450 -> 0.00330: 1.3631x smaller ### etree_iterparse ### Min: 0.639638 -> 0.630989: 1.01x faster Avg: 0.658744 -> 0.641842: 1.03x faster Significant (t=14.82) Stddev: 0.00959 -> 0.00617: 1.5557x smaller ### etree_parse ### Min: 0.433050 -> 0.377830: 1.15x faster Avg: 0.444014 -> 0.389695: 1.14x faster Significant (t=43.28) Stddev: 0.01010 -> 0.00745: 1.3570x smaller ### tornado_http ### Min: 0.335834 -> 0.326492: 1.03x faster Avg: 0.346100 -> 0.334186: 1.04x faster Significant (t=13.66) Stddev: 0.01024 -> 0.00689: 1.4864x smaller The following not significant results are hidden, use -v to show them: 2to3, django_v2, etree_process, fastpickle, fastunpickle, json_dump_v2, json_load, nbody, regex_v8. /tmp$ /tmp/normal/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, b=2)"))' [0.5112445619997743, 0.514110946735, 0.5185121280010208] /tmp$ /tmp/opt/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, b=2)"))' [0.4426167189994885, 0.4465744609988178, 0.4467797579982289] -- ___ Python tracker <http://bugs.python.org/issue23601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21592] Make statistics.median run in linear time
Julian Taylor added the comment: the median of median of 5 is quite significantly slower than a quickselect. numpy implements an introselect which uses quickselect but falls back to median of median of 5 if not enough progress is done. In the numpy implementation for 10 element median (multiselect with 2 selections, one median one min) quickselect is around 3 times faster than mom5 -- ___ Python tracker <http://bugs.python.org/issue21592> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23601] use small object allocator for dict key storage
Julian Taylor added the comment: ping, this has been sitting for 4 years and two python releases. Its about time this stupidly simple thing gets merged. -- ___ Python tracker <http://bugs.python.org/issue23601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26601] Use new madvise()'s MADV_FREE on the private heap
Julian Taylor added the comment: simplest way to fix this would be to not use malloc instead of mmap in the allocator, then you also get MADV_FREE for free when malloc uses it. The rational for using mmap is kind of weak, the source just says "heap fragmentation". The usual argument for using mmap is not that but the instant return of memory to the system, quite the opposite of what the python memory pool does. -- nosy: +jtaylor ___ Python tracker <http://bugs.python.org/issue26601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26601] Use new madvise()'s MADV_FREE on the private heap
Julian Taylor added the comment: ARENA_SIZE is 256kb, the threshold in glibc is up to 32 MB -- ___ Python tracker <http://bugs.python.org/issue26601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26601] Use new madvise()'s MADV_FREE on the private heap
Julian Taylor added the comment: it defaulted to 128kb ten years ago, its a dynamic threshold since ages. -- ___ Python tracker <http://bugs.python.org/issue26601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26601] Use new madvise()'s MADV_FREE on the private heap
Julian Taylor added the comment: glibcs malloc is not obstack, its not a simple linear heap where one object on top means everything below is not freeable. It also uses MADV_DONTNEED give sbrk'd memory back to the system. This is the place where MADV_FREE can now be used now as the latter does not guarantee a page fault. But that said of course you can construct workloads which lead to increased memory usage also with malloc and maybe python triggers them more often than other applications. Is there an existing issues showing the problem? It would be a good form of documentation in the source. -- ___ Python tracker <http://bugs.python.org/issue26601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26601] Use new madvise()'s MADV_FREE on the private heap
Julian Taylor added the comment: I know one can change the allocator, but the default is mmap which I don't think is a very good choice for the current arena size. All the arguments about fragmentation and memory space also apply to pythons arena allocator itself and I am not convinced that fragmentation of the libc allocator is a real problem for python as pythons allocation pattern is very well behaved _due_ to its own arena allocator. I don't doubt it but I think it would be very valuable to document the actual real world use case that triggered this change, just to avoid people stumbling over this again and again. But then I also don't think that anything needs to be necessarily be changed either, I have not seen the mmaps being a problem in any profiles of applications I work with. -- ___ Python tracker <http://bugs.python.org/issue26601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26601] Use new madvise()'s MADV_FREE on the private heap
Julian Taylor added the comment: which is exactly what malloc is already doing for, thus my point is by using malloc we would fullfill your request. But do you have an actual real work application where this would help? it is pretty easy to figure out, just run the application under perf and see if there is a relevant amount of time spent in page_fault/clear_pages. And as mentioned you can already change the allocator for arenas at runtime, so you could also try changing it to malloc and see if your application gets any faster. -- ___ Python tracker <http://bugs.python.org/issue26601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity
New submission from Julian Taylor: multiprocessing.cpu_count and os.cpu_count which are often used to determine how many processes one can run in parallel do not respect the cpuset which may limit the process to only a subset of online cpus leading to heavy oversubscription in e.g. containerized environments: $ taskset -c 0 python3.4 -c 'import multiprocessing; print(multiprocessing.cpu_count())' 32 $ taskset -c 0 python3.4 -c 'import os; print(os.cpu_count())' 32 While the correct result here should be 1. This requires programs to have to use less portable methods like forking to gnu nproc or having to read the /proc filesystem. Having a keyword argument to switch between online and available cpus would be fine too. -- components: Library (Lib) messages: 236671 nosy: jtaylor priority: normal severity: normal status: open versions: Python 3.4 ___ Python tracker <http://bugs.python.org/issue23530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity
Julian Taylor added the comment: I do agree that its probably safer to not change the default return value. But adding a option (or new function) would still be good, the number of available cpus is more often the number you actually want in practice. To the very least the documentation should be improved to clearly state that this number does not guarantee that this amount of cpus are actually available to run on and you should use psutils instead. Code for getting this information for the major operating systems linux, bsd and windows is available in gnu coreutils. I can possibly work on a patch if it would get accepted but I can only actually test it linux. -- ___ Python tracker <http://bugs.python.org/issue23530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity
Julian Taylor added the comment: certainly for anything that needs good control over affinity psutils is the best choice, but I'm not arguing to implement full process control in python. I only want python to provide the number of cores one can work on to make best use of the available resources. If you code search python files for cpu_count you find on github 18000 uses, randomly sampling a few every single one was to determine the number of cpus to start worker jobs to get best performance. Every one of these will oversubscribe a host that restricts the cpus a process can use. This is an issue especially for the increasingly popular use of containers instead of full virtual machines. as a documentation update I would like to have a note saying that this number is the number of (online) cpus in the system may not be the number of of cpus the process can actually use. Maybe with a link to len(psutils.Process.get_affinity()) as a reference on how to obtain that number. there would be no dependence on coreutils, I just mentioned it as you can look up the OS api you need to use to get the number there (e.g. sched_getaffinity). It is trivial API use and should not be a licensing issue, one could also look at the code from psutil which most likely looks very similar. -- ___ Python tracker <http://bugs.python.org/issue23530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity
Julian Taylor added the comment: oh thats great so python already has what I want. Then just an small documentation update would be good, I'll have a go at a patch later. -- ___ Python tracker <http://bugs.python.org/issue23530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity
Julian Taylor added the comment: attached documentation update patch. -- keywords: +patch Added file: http://bugs.python.org/file38369/0001-Issue-23530-Update-documentation-clarify-relation-of.patch ___ Python tracker <http://bugs.python.org/issue23530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23601] use small object allocator for dict key storage
New submission from Julian Taylor: dictionary creation spends a not insignificant amount of time in malloc allocating keys objects. Python has a nice small object allocator that avoids a lot of this overhead and falls back to malloc for larger allocations. Is there a reason the dictionary does not use that allocator for its keys objects? doing so e.g. via attached incomplete patch improves small dict creation performance by 15%. import timeit print(timeit.repeat("dict(a=5, b=2)")) with change: [0.4282559923725, 0.427258015296, 0.436232985377] without [0.516061002634, 0.518172000496, 0.51842199191] or is there something I am overlooking and the use of PyMem_Malloc instead of PyObject_Malloc is an intentional design decision? -- components: Interpreter Core files: 0001-use-small-object-allocator-for-keys-object.patch keywords: patch messages: 237439 nosy: jtaylor priority: normal severity: normal status: open title: use small object allocator for dict key storage versions: Python 3.5 Added file: http://bugs.python.org/file38371/0001-use-small-object-allocator-for-keys-object.patch ___ Python tracker <http://bugs.python.org/issue23601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23601] use small object allocator for dict key storage
Julian Taylor added the comment: PyObject_Malloc just calls malloc above the threshold so there is no problem for larger dicts. For larger dicts the performance of malloc is also irrelevant as the time will be spent elsewhere. -- ___ Python tracker <http://bugs.python.org/issue23601> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com