Re: [Python-Dev] PEP 393 review

2011-08-30 Thread Antoine Pitrou
By the way, I don't know if you're working on it, but StringIO seems a bit broken right now. test_memoryio crashes here: test_newline_cr (test.test_memoryio.CStringIOTest) ... Fatal Python error: Segmentation fault Current thread 0x7f3f6353b700: File "/home/antoine/cpython/pep-393/Lib/tes

Re: [Python-Dev] PEP 393 review

2011-08-30 Thread Martin v. Löwis
> This looks very nice. Is 3.3 a wide build? (how about a narrow build?) It's a wide build. For reference, I also attach 64-bit narrow build results, and 32-bit results (wide, narrow, and PEP 393). Savings are much smaller in narrow builds (larger on 32-bit systems than on 64-bit systems). > (is

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
> I don't compare ASCII and ISO-8859-1 decoders. I was asking if decoding > b'abc' > from ISO-8859-1 is faster than decoding b'ab\xff' from ISO-8859-1, and if > yes: > why? No, that makes no difference. > > Your patch replaces PyUnicode_New(size, 255) ... memcpy(), by > PyUnicode_FromUCS1(

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le lundi 29 août 2011 21:34:48, vous avez écrit : > >> Those haven't been ported to the new API, yet. Consider, for example, > >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; > >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this > >> is a 25% speedup

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Antoine Pitrou
On Mon, 29 Aug 2011 22:32:01 +0200 "Martin v. Löwis" wrote: > I have now written a Django application to measure the effect of PEP > 393, using the debug mode (to find all strings), and sys.getsizeof: > > https://bitbucket.org/t0rsten/pep-393/src/ad02e1b4cad9/pep393utils/djmemprof/count/views.py

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread M.-A. Lemburg
"Martin v. Löwis" wrote: > tl;dr: PEP-393 reduces the memory usage for strings of a very small > Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB. > > Am 26.08.2011 16:55, schrieb Guido van Rossum: >> It would be nice if someone wrote a test to roughly verify these >> numbers,

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
tl;dr: PEP-393 reduces the memory usage for strings of a very small Django app from 7.4MB to 4.4MB, all other objects taking about 1.9MB. Am 26.08.2011 16:55, schrieb Guido van Rossum: > It would be nice if someone wrote a test to roughly verify these > numbers, e.v. by allocating lots of strings

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
>> Those haven't been ported to the new API, yet. Consider, for example, >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this >> is a 25% speedup for PEP 393. > > If I understand correctly, the performan

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Martin v. Löwis
Am 29.08.2011 11:03, schrieb Dirkjan Ochtman: > On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis" wrote: >> result strings. In PEP 393, a buffer must be scanned for the >> highest code point, which means that each byte must be inspected >> twice (a second time when the copying occurs). > > This

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le 28/08/2011 23:06, "Martin v. Löwis" a écrit : Am 28.08.2011 22:01, schrieb Antoine Pitrou: - the iobench results are between 2% acceleration (seek operations), 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le 29/08/2011 11:03, Dirkjan Ochtman a écrit : On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis" wrote: result strings. In PEP 393, a buffer must be scanned for the highest code point, which means that each byte must be inspected twice (a second time when the copying occurs). This may be

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Dirkjan Ochtman
On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis" wrote: >  result strings. In PEP 393, a buffer must be scanned for the >  highest code point, which means that each byte must be inspected >  twice (a second time when the copying occurs). This may be a silly question: are there things in place to

Re: [Python-Dev] PEP 393 review

2011-08-28 Thread Martin v. Löwis
Am 28.08.2011 22:01, schrieb Antoine Pitrou: > >> - the iobench results are between 2% acceleration (seek operations), >> 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and >> 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed >> difference is probably in the UTF-8 dec

Re: [Python-Dev] PEP 393 review

2011-08-28 Thread Antoine Pitrou
Le dimanche 28 août 2011 à 22:23 +0200, "Martin v. Löwis" a écrit : > Am 28.08.2011 22:01, schrieb Antoine Pitrou: > > > >> - the iobench results are between 2% acceleration (seek operations), > >> 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and > >> 37% for large sized reads (

Re: [Python-Dev] PEP 393 review

2011-08-28 Thread Martin v. Löwis
Am 28.08.2011 22:01, schrieb Antoine Pitrou: > >> - the iobench results are between 2% acceleration (seek operations), >> 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and >> 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed >> difference is probably in the UTF-8 dec

Re: [Python-Dev] PEP 393 review

2011-08-28 Thread Antoine Pitrou
> - the iobench results are between 2% acceleration (seek operations), > 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and > 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed > difference is probably in the UTF-8 decoder; I have already > restored the "runs of ASCI

Re: [Python-Dev] PEP 393 review

2011-08-28 Thread Martin v. Löwis
> I would say no more than a 15% slowdown on each of the following > benchmarks: > > - stringbench.py -u > (http://svn.python.org/view/sandbox/trunk/stringbench/) > - iobench.py -t > (in Tools/iobench/) > - the json_dump, json_load and regex_v8 tests from > http://hg.python.org/benchmarks/

Re: [Python-Dev] PEP 393 review

2011-08-28 Thread Martin v. Löwis
Am 26.08.2011 16:56, schrieb Guido van Rossum: > Also, please add the table (and the reasoning that led to it) to the PEP. Done! Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: ht

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Stefan Behnel
Stefan Behnel, 26.08.2011 20:28: "Martin v. Löwis", 26.08.2011 18:56: I agree with your observation that somebody should be done about error handling, and will update the PEP shortly. I propose that PyUnicode_Ready should be explicitly called on input where raising an exception is feasible. In c

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Stefan Behnel
"Martin v. Löwis", 26.08.2011 18:56: I agree with your observation that somebody should be done about error handling, and will update the PEP shortly. I propose that PyUnicode_Ready should be explicitly called on input where raising an exception is feasible. In contexts where it is not feasible (

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Martin v. Löwis
Am 26.08.2011 17:55, schrieb Stefan Behnel: > Stefan Behnel, 25.08.2011 23:30: >> Sadly, a quick look at a couple of recent commits in the pep-393 branch >> suggested that it is not even always obvious to you as the authors which >> macros can be called safely and which cannot. I immediately spotte

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Stefan Behnel
Stefan Behnel, 25.08.2011 23:30: Sadly, a quick look at a couple of recent commits in the pep-393 branch suggested that it is not even always obvious to you as the authors which macros can be called safely and which cannot. I immediately spotted a bug in one of the updated core functions (unicode

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Guido van Rossum
Also, please add the table (and the reasoning that led to it) to the PEP. On Fri, Aug 26, 2011 at 7:55 AM, Guido van Rossum wrote: > It would be nice if someone wrote a test to roughly verify these > numbers, e.v. by allocating lots of strings of a certain size and > measuring the process size be

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Guido van Rossum
It would be nice if someone wrote a test to roughly verify these numbers, e.v. by allocating lots of strings of a certain size and measuring the process size before and after (being careful to adjust for the list or other data structure required to keep those objects alive). --Guido On Fri, Aug 2

Re: [Python-Dev] PEP 393 review

2011-08-26 Thread Martin v. Löwis
> But strings are allocated via PyObject_Malloc(), i.e. the custom > arena-based allocator -- isn't its overhead (for small objects) less > than 2 pointers per block? Ah, right, I missed that. Indeed, those have no header, and the only overhead is the padding to a multiple of 8. That shifts the p

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Stefan Behnel
Stefan Behnel, 25.08.2011 23:30: Stefan Behnel, 25.08.2011 20:47: "Martin v. Löwis", 24.08.2011 20:15: - issues to be considered (unclarities, bugs, limitations, ...) A problem of the current implementation is the need for calling PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g.

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Guido van Rossum
On Thu, Aug 25, 2011 at 1:24 AM, "Martin v. Löwis" wrote: >> With this PEP, the unicode object overhead grows to 10 pointer-sized >> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. >> Does it have any adverse effects? > > If I count correctly, it's only three *additional* wor

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Stefan Behnel
Stefan Behnel, 25.08.2011 20:47: "Martin v. Löwis", 24.08.2011 20:15: - issues to be considered (unclarities, bugs, limitations, ...) A problem of the current implementation is the need for calling PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to insufficient memory). Basic

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Stefan Behnel
"Martin v. Löwis", 24.08.2011 20:15: - issues to be considered (unclarities, bugs, limitations, ...) A problem of the current implementation is the need for calling PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to insufficient memory). Basically, this means that even somet

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Antoine Pitrou
Hello, On Thu, 25 Aug 2011 10:24:39 +0200 "Martin v. Löwis" wrote: > > On a 32-bit machine with a 32-bit wchar_t, pure-ASCII strings of length > 1 (+NUL) will take the same memory either way: 8 bytes for the > characters in 3.2, 2 bytes in 3.3 + extra pointer + padding. Strings > of 2 or more c

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Victor Stinner
Le 25/08/2011 06:46, Stefan Behnel a écrit : Conversion to wchar_t* is common, especially on Windows. That's an issue. However, I cannot say how common this really is in practice. Surely depends on the specific code, right? How common is it in core CPython? Quite all functions taking text as

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Martin v. Löwis
> With this PEP, the unicode object overhead grows to 10 pointer-sized > words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. > Does it have any adverse effects? If I count correctly, it's only three *additional* words (compared to 3.2): four new ones, minus one that is removed. I

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Stefan Behnel
"Martin v. Löwis", 24.08.2011 20:15: Guido has agreed to eventually pronounce on PEP 393. Before that can happen, I'd like to collect feedback on it. There have been a number of voice supporting the PEP in principle Absolutely. - conditions you would like to pose on the implementation before

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Stefan Behnel
Victor Stinner, 25.08.2011 00:29: With this PEP, the unicode object overhead grows to 10 pointer-sized words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. Does it have any adverse effects? For pure ASCII, it might be possible to use a shorter struct: typedef struct { PyO

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Victor Stinner
> With this PEP, the unicode object overhead grows to 10 pointer-sized > words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. > Does it have any adverse effects? For pure ASCII, it might be possible to use a shorter struct: typedef struct { PyObject_HEAD Py_ssize_t length

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Antoine Pitrou
On Wed, 24 Aug 2011 20:15:24 +0200 "Martin v. Löwis" wrote: > - issues to be considered (unclarities, bugs, limitations, ...) With this PEP, the unicode object overhead grows to 10 pointer-sized words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. Does it have any adverse effects