[issue12236] Tkinter __version__ uses subversion substitution

2011-06-01 Thread Julian Taylor

New submission from Julian Taylor :

./Lib/lib-tk/Tkinter.py:33 has this svn keyword substitution:
 __version__ = "$Revision$"

Due to the change to hg this field is not substituted and makes __version__ 
quite pointless.
This affects the python 2.7.2rc1.

--
components: Tkinter
messages: 137455
nosy: jtaylor
priority: normal
severity: normal
status: open
title: Tkinter __version__ uses subversion substitution
type: behavior
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue12236>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12236] Tkinter __version__ uses subversion substitution

2011-06-02 Thread Julian Taylor

Julian Taylor  added the comment:

matplotlib fails to build due to this with 2.7.2rc1 in ubuntu oneiric (but its 
seems simple to fix):
https://launchpad.net/ubuntu/+source/matplotlib/1.0.1-2ubuntu1/+build/2535369

--

___
Python tracker 
<http://bugs.python.org/issue12236>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12752] locale.normalize does not take unicode strings

2011-08-15 Thread Julian Taylor

New submission from Julian Taylor :

using unicode strings for locale.normalize gives following traceback with 
python2.7:

~$ python2.7 -c 'import locale; locale.normalize(u"en_US")'
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.7/locale.py", line 358, in normalize
fullname = localename.translate(_ascii_lower_map)
TypeError: character mapping must return integer, None or unicode

with python2.6 it works and it also works with non-unicode strings in 2.7

--
components: Unicode
messages: 142118
nosy: jtaylor
priority: normal
severity: normal
status: open
title: locale.normalize does not take unicode strings
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue12752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12752] locale.normalize does not take unicode strings

2011-08-15 Thread Julian Taylor

Julian Taylor  added the comment:

this is a regression introduced by fixing http://bugs.python.org/issue1813

This breaks some user code,. e.g. wx.Locale.GetCanonicalName returns unicode.
Example bugs:
https://bugs.launchpad.net/ubuntu/+source/update-manager/+bug/824734
https://bugs.launchpad.net/ubuntu/+source/playonlinux/+bug/825421

--

___
Python tracker 
<http://bugs.python.org/issue12752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26530] tracemalloc: add C API to manually track/untrack memory allocations

2017-04-12 Thread Julian Taylor

Julian Taylor added the comment:

The api looks good to me. Works fine in numpy.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue26530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26530] tracemalloc: add C API to manually track/untrack memory allocations

2017-04-12 Thread Julian Taylor

Julian Taylor added the comment:

I don't see any reason why not to.

--

___
Python tracker 
<http://bugs.python.org/issue26530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30054] Expose tracemalloc C API to track/untrack memory blocks

2017-04-12 Thread Julian Taylor

Julian Taylor added the comment:

I am not sure if _PyTraceMalloc_GetTraceback really needs to be a public 
function.
Exposing the tracing information should probably just go over python interfaces.

--

___
Python tracker 
<http://bugs.python.org/issue30054>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30054] Expose tracemalloc C API to track/untrack memory blocks

2017-04-12 Thread Julian Taylor

Julian Taylor added the comment:

With this changeset it would:
https://github.com/numpy/numpy/pull/8885

--

___
Python tracker 
<http://bugs.python.org/issue30054>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)

2017-04-14 Thread Julian Taylor

New submission from Julian Taylor:

Probably a case of 'don't do that' but reading lines in a compressed files in 
binary mode produces bytes with invalid newlines in encodings that where '\n' 
is encoded as something else:

with lzma.open("test.xz", "wt", encoding="UTF-32-LE") as f:
f.write('0 1 2\n3 4 5');

lzma.open("test.xz", "rb").readlines()[0].decode('UTF-32-LE')

Fails with:
UnicodeDecodeError: 'utf-32-le' codec can't decode byte 0x0a in position 20: 
truncated data

as readlines() produces:
b'0\x00\x00\x00 \x00\x00\x001\x00\x00\x00 \x00\x00\x002\x00\x00\x00\n'
The last newline should be '\n'.encode('UTF-32-LE') == b'\n\x00\x00\x00'

--
components: Library (Lib)
messages: 291661
nosy: jtaylor
priority: normal
severity: normal
status: open
title: binary compressed file reading corrupts newlines (lzma, gzip, bz2)

___
Python tracker 
<http://bugs.python.org/issue30073>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)

2017-04-14 Thread Julian Taylor

Julian Taylor added the comment:

on second though not really worth an issue as it is a general problem of 
readline on binary streams. Sorry for the noise.

--
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue30073>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30073] binary compressed file reading corrupts newlines (lzma, gzip, bz2)

2017-04-14 Thread Julian Taylor

Julian Taylor added the comment:

see also http://bugs.python.org/issue17083

--

___
Python tracker 
<http://bugs.python.org/issue30073>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30150] raw debug allocators to not return malloc alignment

2017-04-23 Thread Julian Taylor

New submission from Julian Taylor:

The debug raw allocator do not return the same alignment as malloc. See  
_PyMem_DebugRawAlloc:
https://github.com/python/cpython/blob/master/Objects/obmalloc.c#L1873

The line
return p + 2*SST

adds 2 * sizeof(size_t) to the pointer returned by malloc.
On for example x32 malloc returns 16 byte aligned memory but size_t is 4 bytes.
This makes all memory returned by the debug allocators not aligned the what the 
system assumes on such platforms.

--
components: Interpreter Core
messages: 292187
nosy: jtaylor
priority: normal
severity: normal
status: open
title: raw debug allocators to not return malloc alignment
versions: Python 2.7, Python 3.6, Python 3.7

___
Python tracker 
<http://bugs.python.org/issue30150>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30150] raw debug allocators to not return malloc alignment

2017-05-23 Thread Julian Taylor

Julian Taylor added the comment:

no in numpy it is just a case of using the wrong allocator in a certain spot, 
an issue that can be fixed in numpy.
But it is also minor bug/documentation issue in Python itself.

Alignment isn't very important for SIMD any more but there are architectures 
where alignment is still mandatory so numpy is sprinkled with asserts checking 
alignment which triggered on x32.
It is a very minor issue as to my knowledge none of the platforms with 
alignment requirement has the properties of x32 and x32 doesn't actually care 
about alignment either.

--

___
Python tracker 
<http://bugs.python.org/issue30150>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30150] raw debug allocators to not return malloc alignment

2017-05-24 Thread Julian Taylor

Julian Taylor added the comment:

The largest type is usually the long double. Its alignment ranges from 4 bytes 
(i386) to 16 bytes (sparc).
So Py_MAX (sizeof (size_t), 8) should indeed do it.

--

___
Python tracker 
<http://bugs.python.org/issue30150>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21148] avoid memset in small tuple creation

2014-04-03 Thread Julian Taylor

New submission from Julian Taylor:

attached a prototype patch that avoids the memset of ob_item in PyTuple_New 
which is not necessary for the BUILD_TUPLE bytecode and PyTuple_Pack as these 
overwrite every entry in ob_item anyway.
This improves small tuple creation by about 5%.

It does this by adding a new internal function that does not use the memset 
loop and wrapping that in PyTuple_New that does it. _Pack and ceval call the 
internal function.
The patch still needs cleanup I don't know where the signature for ceval.c 
would best go. Does the internal function need to be hidden from the DSO?

microbenchmark, compiled with gcc-4.8.2 on ubuntu 14.04 amd64, default 
configure options:

import timeit
print(min(timeit.repeat("(a,)", setup="a = 1; b = 1", repeat=5, number=10**7)))
print(min(timeit.repeat("(a, b)", setup="a = 1; b = 1", repeat=5, 
number=10**7)))

before:
0.45767
0.52926

after:
0.42652
0.50122

larger tuples do not profit much as the loading is more expensive in comparison.

--
components: Interpreter Core
files: avoid-memset.patch
keywords: patch
messages: 215461
nosy: jtaylor
priority: normal
severity: normal
status: open
title: avoid memset in small tuple creation
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file34715/avoid-memset.patch

___
Python tracker 
<http://bugs.python.org/issue21148>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21233] Add *Calloc functions to CPython memory allocation API

2014-04-17 Thread Julian Taylor

Julian Taylor added the comment:

won't replacing _PyObject_GC_Malloc with a calloc cause Var objects 
(PyObject_NewVar) to be completely zeroed which I think they didn't before?
Some numeric programs stuff a lot of data into var objects and could care about 
python suddenly setting them to zero when they don't need it.
An example would be tinyarray.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue21233>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21233] Add *Calloc functions to CPython memory allocation API

2014-04-17 Thread Julian Taylor

Julian Taylor added the comment:

I just tested it, PyObject_NewVar seems to use RawMalloc not the GC malloc so 
its probably fine.

--

___
Python tracker 
<http://bugs.python.org/issue21233>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21592] Make statistics.median run in linear time

2014-06-01 Thread Julian Taylor

Julian Taylor added the comment:

in the case of the median you can archive similar performance to a multiselect 
by simply calling min([len(data) // 2 + 1]) for the second order statistic 
which you need for the averaging of even number of elements.

maybe an interesting datapoint would be to compare with numpys selection 
algorithm which is a intromultiselect (implemented in C for native datattypes).
It uses a standard median of 3 quickselect with a cutoff in recursion depth to 
median of median of group of 5.
the multiselect is implemented using a sorted list of kth order statistics and 
reducing the search space for each kth by maintaining a stack of all visited 
pivots.
E.g. if you search for 30 and 100, when during the search for 30 one has 
visited pivot 70 and 110, the search for 100 only needs to select in l[70:110].
The not particularly readable implementation is in: 
./numpy/core/src/npysort/selection.c.src
unfortunately for object types it currently falls back to quicksort so you 
can't directly compare performance with the pure python variants.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue21592>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21592] Make statistics.median run in linear time

2014-06-07 Thread Julian Taylor

Julian Taylor added the comment:

for median alone a multiselect is probably overkill (thats why I mentioned the 
minimum trick)

but a selection algorithm is useful on its own for all of python and then a 
multiselect should be considered.
Of course that means it would need to be implemented in C like sorted() so you 
actually have a significant performance gain that makes adding a new python 
function worthwhile.

Also just to save numpys honor, you are benchmarking python list -> numpy array 
conversion and not the actual selection in your script with the numpy 
comparison. The conversion is significantly slower than the selection itself. 
Also select2b is inplace while np.partition is out of place. Repeated inplace 
selection typically gets faster as the number of required swaps goes down and 
can even be constant in time if the requested value does not change.
With that fixed numpy outperforms pypy by about a factor 2 (but pypys 
performance is indeed quite impressive as it is far more generic)

--

___
Python tracker 
<http://bugs.python.org/issue21592>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19308] Tools/gdb/libpython.py does not support GDB linked against Python 3

2013-11-04 Thread Julian Taylor

Julian Taylor added the comment:

I tested the latest patch (python27-gdb_py3.patch) with ubuntu 13.10 gdb 
compiled against python3.3, while it fixes the syntax errors it does not fix 
the functionality.
E.g. one gets this error on breakpoints:

Python Exception  There is no member named length.: 
Breakpoint 3, PyTuple_Size (op=) at ../Objects/tupleobject.c:127

and the objects are not printed in their string representation as they should 
be with the plugin.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue19308>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue19308] Tools/gdb/libpython.py does not support GDB linked against Python 3

2013-11-04 Thread Julian Taylor

Julian Taylor added the comment:

on further investigation I seem to have screwed up patching the files. Patching 
properly they do work. Sorry for the noise.

--

___
Python tracker 
<http://bugs.python.org/issue19308>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module

2014-01-25 Thread Julian Taylor

New submission from Julian Taylor:

the pvariance and variance functions take the argument mu and xbar to pass the 
population and sample mean to avoid some recomputation.

I assume the keyword arguments are different because the two means accepted are 
different, but the docstring does not indicate this directly.
It just says mu or xbar is the mean of the data. The module documentation is a 
little clearer but only in the grey box right at the end.

I would propose to change the docstring and module documentation to explicitly 
state that mu is the population mean and xbar is the population mean.
E.g.

- The optional argument mu, if given, should be the mean of
the data.
+ The optional argument mu, if given, should be the population mean of
the data.

etc.

--
messages: 209192
nosy: jtaylor
priority: normal
severity: normal
status: open
title: clarify meaning of xbar and mu in pvariance/variance of statistics module
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue20389>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module

2014-01-25 Thread Julian Taylor

Julian Taylor added the comment:

xbar is the *sample* mean of course

maybe with proper docstrings the two functions could also use the same keyword 
argument?

--

___
Python tracker 
<http://bugs.python.org/issue20389>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3158] Doctest fails to find doctests in extension modules

2014-01-28 Thread Julian Taylor

Julian Taylor added the comment:

the patch seems to work for me in ipython.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue3158>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20389] clarify meaning of xbar and mu in pvariance/variance of statistics module

2014-01-28 Thread Julian Taylor

Changes by Julian Taylor :


--
components: +Library (Lib)

___
Python tracker 
<http://bugs.python.org/issue20389>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16754] Incorrect shared library extension on linux

2013-04-04 Thread Julian Taylor

Julian Taylor added the comment:

is SHLIB_SUFFIX=".so" really correct on mac?
shared libraries have .dylib extension, loadable modules have .so (which would 
be EXT_SUFFIX?)
e.g libpython itself uses .dylib.

--
nosy: +jtaylor108

___
Python tracker 
<http://bugs.python.org/issue16754>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16754] Incorrect shared library extension on linux

2013-04-04 Thread Julian Taylor

Julian Taylor added the comment:

I'm going by what says in configure:
# SHLIB_SUFFIX is the extension of shared libraries

The extension of shared libraries on macos is .dylib in most cases (e.g libtool 
based libraries and as mentioned python itself)

Maybe its just a documentation/naming issue.

--

___
Python tracker 
<http://bugs.python.org/issue16754>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16754] Incorrect shared library extension on linux

2013-04-04 Thread Julian Taylor

Julian Taylor added the comment:

just to clarify its not any issue in python, python is working fine with .so
The issue is just that theses variables tends to be used by other applications 
to figure out information on the system (like shared library extension, see 
numpy.distutils)
You certainly could argue that these applications are broken by even needing 
this information, but a proper naming of the variables could help reduce 
confusion and wrong code.

--

___
Python tracker 
<http://bugs.python.org/issue16754>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17895] TemporaryFile name returns an integer in python3

2013-05-03 Thread Julian Taylor

New submission from Julian Taylor:

sys.version_info(major=3, minor=3, micro=1, releaselevel='final', serial=0)
In [3]: type(tempfile.TemporaryFile().name)
Out[3]: builtins.int

in python2 it returned a string, this is a somewhat pointless api change which 
breaks some third party code, e.g. numpy 
(https://github.com/numpy/numpy/issues/3302)

--
components: Library (Lib)
messages: 188305
nosy: jtaylor
priority: normal
severity: normal
status: open
title: TemporaryFile name returns an integer in python3
type: behavior
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue17895>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5309] distutils doesn't parallelize extension module compilation

2015-01-16 Thread Julian Taylor

Julian Taylor added the comment:

very nice, thanks for adding this.

coincidentally numpy added the same to numpy.distutils independently just a 
week later, though numpy also accepts an environment variable to set the number 
of jobs.
This is useful for e.g. pip installations where one does not control the 
command line. Also an environment variable allows parallel jobs in environments 
where it is not guaranteed that the feature is available. E.g. you could just 
put it into your .bashrc and when building with 3.5 it will just work and 2.7 
will not fail.

Is the naming --parallel/j already fixed? I'll change the numpy options to the 
same name then.

Please also add it to the release notes so the feature can be discovered easier.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue5309>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23601] use small object allocator for dict key storage

2015-07-09 Thread Julian Taylor

Julian Taylor added the comment:

Large objects are just if size > 512: return malloc(size) there is no reason it 
should be slower.
Also for large objects allocation speed does not matter as much.

--

___
Python tracker 
<http://bugs.python.org/issue23601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21148] avoid needless pointers initialization in small tuple creation

2015-07-09 Thread Julian Taylor

Julian Taylor added the comment:

right at best its probably too insignificant to really be worthwhile, closing.

--
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue21148>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity

2015-07-09 Thread Julian Taylor

Julian Taylor added the comment:

any comments on the doc changes?

--

___
Python tracker 
<http://bugs.python.org/issue23530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23601] use small object allocator for dict key storage

2015-07-10 Thread Julian Taylor

Julian Taylor added the comment:

Your benchmarks are not affected by this change see the other issue. They are 
also not representative of every workload out there.

I can at least see the argument why you didn't want to put the other variant of 
this change in as it made the code a tiny bit more complicated, but I do not 
understand the reluctance for this variant. It doesn't change the complexity of 
the code one bit.
If you doubt the performance of pythons own small object allocator, python 
should maybe stop using it alltogether?

--

___
Python tracker 
<http://bugs.python.org/issue23601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23601] use small object allocator for dict key storage

2015-07-11 Thread Julian Taylor

Julian Taylor added the comment:

ok I ran it again, but note the machine was under use the full time so the 
results are likely have no meaning.

python perf.py -r -b default /tmp/normal/bin/python3 /tmp/opt/bin/python3

Min: 0.399279 -> 0.376527: 1.06x faster
Avg: 0.410819 -> 0.383315: 1.07x faster
Significant (t=49.29)
Stddev: 0.00450 -> 0.00330: 1.3631x smaller

### etree_iterparse ###
Min: 0.639638 -> 0.630989: 1.01x faster
Avg: 0.658744 -> 0.641842: 1.03x faster
Significant (t=14.82)
Stddev: 0.00959 -> 0.00617: 1.5557x smaller

### etree_parse ###
Min: 0.433050 -> 0.377830: 1.15x faster
Avg: 0.444014 -> 0.389695: 1.14x faster
Significant (t=43.28)
Stddev: 0.01010 -> 0.00745: 1.3570x smaller

### tornado_http ###
Min: 0.335834 -> 0.326492: 1.03x faster
Avg: 0.346100 -> 0.334186: 1.04x faster
Significant (t=13.66)
Stddev: 0.01024 -> 0.00689: 1.4864x smaller

The following not significant results are hidden, use -v to show them:
2to3, django_v2, etree_process, fastpickle, fastunpickle, json_dump_v2, 
json_load, nbody, regex_v8.

/tmp$ /tmp/normal/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, 
b=2)"))'
[0.5112445619997743, 0.514110946735, 0.5185121280010208]
/tmp$ /tmp/opt/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, 
b=2)"))'
[0.4426167189994885, 0.4465744609988178, 0.4467797579982289]

--

___
Python tracker 
<http://bugs.python.org/issue23601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21592] Make statistics.median run in linear time

2016-01-03 Thread Julian Taylor

Julian Taylor added the comment:

the median of median of 5 is quite significantly slower than a quickselect.
numpy implements an introselect which uses quickselect but falls back to median 
of median of 5 if not enough progress is done.
In the numpy implementation for 10 element median (multiselect with 2 
selections, one median one min) quickselect is around 3 times faster than mom5

--

___
Python tracker 
<http://bugs.python.org/issue21592>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23601] use small object allocator for dict key storage

2016-01-29 Thread Julian Taylor

Julian Taylor added the comment:

ping, this has been sitting for 4 years and two python releases. Its about time 
this stupidly simple thing gets merged.

--

___
Python tracker 
<http://bugs.python.org/issue23601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread Julian Taylor

Julian Taylor added the comment:

simplest way to fix this would be to not use malloc instead of mmap in the 
allocator, then you also get MADV_FREE for free when malloc uses it.
The rational for using mmap is kind of weak, the source just says "heap 
fragmentation". The usual argument for using mmap is not that but the instant 
return of memory to the system, quite the opposite of what the python memory 
pool does.

--
nosy: +jtaylor

___
Python tracker 
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread Julian Taylor

Julian Taylor added the comment:

ARENA_SIZE is 256kb, the threshold in glibc is up to 32 MB

--

___
Python tracker 
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-21 Thread Julian Taylor

Julian Taylor added the comment:

it defaulted to 128kb ten years ago, its a dynamic threshold since ages.

--

___
Python tracker 
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Julian Taylor

Julian Taylor added the comment:

glibcs malloc is not obstack, its not a simple linear heap where one object on 
top means everything below is not freeable. It also uses MADV_DONTNEED give 
sbrk'd memory back to the system. This is the place where MADV_FREE can now be 
used now as the latter does not guarantee a page fault.
But that said of course you can construct workloads which lead to increased 
memory usage also with malloc and maybe python triggers them more often than 
other applications. Is there an existing issues showing the problem? It would 
be a good form of documentation in the source.

--

___
Python tracker 
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Julian Taylor

Julian Taylor added the comment:

I know one can change the allocator, but the default is mmap which I don't 
think is a very good choice for the current arena size.
All the arguments about fragmentation and memory space also apply to pythons 
arena allocator itself and I am not convinced that fragmentation of the libc 
allocator is a real problem for python as pythons allocation pattern is very 
well behaved _due_ to its own arena allocator. I don't doubt it but I think it 
would be very valuable to document the actual real world use case that 
triggered this change, just to avoid people stumbling over this again and again.

But then I also don't think that anything needs to be necessarily be changed 
either, I have not seen the mmaps being a problem in any profiles of 
applications I work with.

--

___
Python tracker 
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26601] Use new madvise()'s MADV_FREE on the private heap

2016-04-22 Thread Julian Taylor

Julian Taylor added the comment:

which is exactly what malloc is already doing for, thus my point is by using 
malloc we would fullfill your request.

But do you have an actual real work application where this would help?
it is pretty easy to figure out, just run the application under perf and see if 
there is a relevant amount of time spent in page_fault/clear_pages.

And as mentioned you can already change the allocator for arenas at runtime, so 
you could also try changing it to malloc and see if your application gets any 
faster.

--

___
Python tracker 
<http://bugs.python.org/issue26601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity

2015-02-26 Thread Julian Taylor

New submission from Julian Taylor:

multiprocessing.cpu_count and os.cpu_count which are often used to determine 
how many processes one can run in parallel do not respect the cpuset which may 
limit the process to only a subset of online cpus leading to heavy 
oversubscription in e.g. containerized environments:

$ taskset -c 0 python3.4 -c 'import multiprocessing; 
print(multiprocessing.cpu_count())'
32
$ taskset -c 0 python3.4 -c 'import os; print(os.cpu_count())'
32

While the correct result here should be 1.

This requires programs to have to use less portable methods like forking to gnu 
nproc or having to read the /proc filesystem.

Having a keyword argument to switch between online and available cpus would be 
fine too.

--
components: Library (Lib)
messages: 236671
nosy: jtaylor
priority: normal
severity: normal
status: open
versions: Python 3.4

___
Python tracker 
<http://bugs.python.org/issue23530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity

2015-02-26 Thread Julian Taylor

Julian Taylor added the comment:

I do agree that its probably safer to not change the default return value.
But adding a option (or new function) would still be good, the number of 
available cpus is more often the number you actually want in practice.
To the very least the documentation should be improved to clearly state that 
this number does not guarantee that this amount of cpus are actually available 
to run on and you should use psutils instead.

Code for getting this information for the major operating systems linux, bsd 
and windows is available in gnu coreutils.
I can possibly work on a patch if it would get accepted but I can only actually 
test it linux.

--

___
Python tracker 
<http://bugs.python.org/issue23530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity

2015-02-27 Thread Julian Taylor

Julian Taylor added the comment:

certainly for anything that needs good control over affinity psutils is the 
best choice, but I'm not arguing to implement full process control in python. I 
only want python to provide the number of cores one can work on to make best 
use of the available resources.

If you code search python files for cpu_count you find on github 18000 uses, 
randomly sampling a few every single one was to determine the number of cpus to 
start worker jobs to get best performance. Every one of these will 
oversubscribe a host that restricts the cpus a process can use. This is an 
issue especially for the increasingly popular use of containers instead of full 
virtual machines.

as a documentation update I would like to have a note saying that this number 
is the number of (online) cpus in the system may not be the number of of cpus 
the process can actually use. Maybe with a link to 
len(psutils.Process.get_affinity()) as a reference on how to obtain that number.

there would be no dependence on coreutils, I just mentioned it as you can look 
up the OS api you need to use to get the number there (e.g. sched_getaffinity). 
It is trivial API use and should not be a licensing issue, one could also look 
at the code from psutil which most likely looks very similar.

--

___
Python tracker 
<http://bugs.python.org/issue23530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity

2015-02-27 Thread Julian Taylor

Julian Taylor added the comment:

oh thats great so python already has what I want. Then just an small 
documentation update would be good, I'll have a go at a patch later.

--

___
Python tracker 
<http://bugs.python.org/issue23530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23530] os and multiprocessing.cpu_count do not respect cpuset/affinity

2015-03-07 Thread Julian Taylor

Julian Taylor added the comment:

attached documentation update patch.

--
keywords: +patch
Added file: 
http://bugs.python.org/file38369/0001-Issue-23530-Update-documentation-clarify-relation-of.patch

___
Python tracker 
<http://bugs.python.org/issue23530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23601] use small object allocator for dict key storage

2015-03-07 Thread Julian Taylor

New submission from Julian Taylor:

dictionary creation spends a not insignificant amount of time in malloc 
allocating keys objects. Python has a nice small object allocator that avoids a 
lot of this overhead and falls back to malloc for larger allocations.
Is there a reason the dictionary does not use that allocator for its keys 
objects?
doing so e.g. via attached incomplete patch improves small dict creation 
performance by 15%.

import  timeit
print(timeit.repeat("dict(a=5, b=2)"))

with change:
[0.4282559923725, 0.427258015296, 0.436232985377]
without
[0.516061002634, 0.518172000496, 0.51842199191]


or is there something I am overlooking and the use of PyMem_Malloc instead of 
PyObject_Malloc is an intentional design decision?

--
components: Interpreter Core
files: 0001-use-small-object-allocator-for-keys-object.patch
keywords: patch
messages: 237439
nosy: jtaylor
priority: normal
severity: normal
status: open
title: use small object allocator for dict key storage
versions: Python 3.5
Added file: 
http://bugs.python.org/file38371/0001-use-small-object-allocator-for-keys-object.patch

___
Python tracker 
<http://bugs.python.org/issue23601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23601] use small object allocator for dict key storage

2015-03-07 Thread Julian Taylor

Julian Taylor added the comment:

PyObject_Malloc just calls malloc above the threshold so there is no problem 
for larger dicts.
For larger dicts the performance of malloc is also irrelevant as the time will 
be spent elsewhere.

--

___
Python tracker 
<http://bugs.python.org/issue23601>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com