Re: [Python-Dev] I would like an svn account

2009-01-03 Thread Victor Stinner
o be part of the upstream code base. A distributed VCS is useful to test huge changes. Performance improvment on integers (patches to optimize the multiplication, use base 2^30 instead of 2^15, etc.) would benefit from such tools, because cooperative work is easier. -- Victor Stinner aka haypo

Re: [Python-Dev] I would like an svn account

2009-01-03 Thread Victor Stinner
s. I spoke about my issues because I know them better than the other ones ;-) > Victor wants to commit small changes without review. That's true. Open an issue for trivial changes takes to much time. -- I hope that the discussion of my svn acount would benefit to the whole Python process

Re: [Python-Dev] I would like an svn account

2009-01-03 Thread Victor Stinner
a different way"? If Martin doesn't understand the patch, who will understand it? :-) -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-d

[Python-Dev] [Py3k] curses module and libncursesw library

2009-01-07 Thread Victor Stinner
It looks like libncursesw is available on Linux, *BSD, Mac OS X. About (Open)Solaris, a libncurses package has been created in septembre 2008, but no unicode version yet. On Windows, there is a Cygwin port of libncurses, but I don't know if it contains the unicode version. -- Victor St

Re: [Python-Dev] Fixing incorrect indentations in C files (Decoder functions accept str in py3k)

2009-01-08 Thread Victor Stinner
uot;/usr/bin/diff" -x "-ub"< to ignore any space change, which break some patches :-/ So if you choose to change the indentation, i would be nice to run also >sed "s/[ \t]\+$//g -i/< ;-) -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ _

Re: [Python-Dev] Add Py_off_t and related APIs?

2009-01-13 Thread Victor Stinner
start, size_t length, int prot, int flags, int fd, off_t offset); mmapmodule.c uses "Py_ssize_t" type and _GetMapSize() private function to convert the long integer to the Py_ssize_t type. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___

Re: [Python-Dev] Add Py_off_t and related APIs?

2009-01-13 Thread Victor Stinner
Le Tuesday 13 January 2009 22:47:52 Victor Stinner, vous avez écrit : > Le Tuesday 13 January 2009 21:33:28 Martin v. Löwis, vous avez écrit : > > I would do this through a converter function (O&), but yes, > > making it private to the io library sounds about right. Who > >

Re: [Python-Dev] socket.create_connection slow

2009-01-14 Thread Victor Stinner
has only an IPv4 address. The address "::1" is "ip6-localhost" or "ip6-loopback". You should check why the connect() to IPv6 is so long to raise an error. About the test: since SocketServer address family is constant (IPv4), you can forc

Re: [Python-Dev] Problems with unicode_literals

2009-01-17 Thread Victor Stinner
various problems with unicode and gettext - http://bugs.python.org/issue4319: optparse and non-ascii help strings -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/l

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-27 Thread Victor Stinner
Benjamin Peterson a écrit : There are also several IO bugs that should be fixed before it becomes official like #5006. I looked at this one, but I discovered another a bug with f.tell(): it's now issue #5008. This issue is now closed, that I will look again to #5006. See also #5016 (f.seekab

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Victor Stinner
Le Wednesday 28 January 2009 11:55:16 Antoine Pitrou, vous avez écrit : > 2.x has no encoding costs, which explains why it's so much faster. Why not testing io.open() or codecs.open() which create unicode strings? -- Victor Stinner aka haypo http://www.haypocalc.

Re: [Python-Dev] Python 3.0.1 (io-in-c)

2009-01-28 Thread Victor Stinner
and that's a great news ;-) -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] urllib bug in Python 3.2.1?

2011-08-08 Thread Victor Stinner
With Python 3.1 and Python 3.2.1 it works OK, but with Python 3.2.1 the read returns an empty string (I checked it myself). http://bugs.python.org/issue12576 The bug is now fixed. Can you release a Python 3.2.2, maybe only with this fix? Victor __

Re: [Python-Dev] Status of the PEP 400? (deprecate codecs.StreamReader/StreamWriter)

2011-08-11 Thread Victor Stinner
Le 29/07/2011 19:01, Guido van Rossum a écrit : I will add your alternative to the PEP (except if you would like to do that yourself?). If I understood correctly, you propose to: * rename codecs.open() to codecs.open_stream() * change codecs.open() to reuse open() (and so io.TextIOWrapper)

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-23 Thread Victor Stinner
Le 23/08/2011 15:06, "Martin v. Löwis" a écrit : Well, things have to be done in order: 1. the PEP needs to be approved 2. the performance bottlenecks need to be identified 3. optimizations should be applied. I would not vote for the PEP if it slows down Python, especially if it's much slower.

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-23 Thread Victor Stinner
Le mardi 23 août 2011 00:14:40, Antoine Pitrou a écrit : > Hello, > > On Mon, 22 Aug 2011 14:58:51 -0400 > > Torsten Becker wrote: > > I have implemented an initial version of PEP 393 -- "Flexible String > > Representation" as part of my Google Summer of Code project. My patch > > is hosted as

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-23 Thread Victor Stinner
Le mardi 23 août 2011 00:14:40, Antoine Pitrou a écrit : > - You could try to run stringbench, which can be found at > http://svn.python.org/projects/sandbox/trunk/stringbench (*) > and there's iobench (the text mode benchmarks) in the Tools/iobench > directory. Some raw numbers. stringbenc

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-23 Thread Victor Stinner
Le lundi 22 août 2011 20:58:51, Torsten Becker a écrit : > [1]: http://www.python.org/dev/peps/pep-0393 state: lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2 next 2 bits (mask 0x0C) - form of str: 00 => reserved 01 => 1 byte (Latin-1) 10 => 2 byte (UCS-2) 11 => 4 byte (UCS-4); nex

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-23 Thread Victor Stinner
Le mercredi 24 août 2011 00:46:16, Victor Stinner a écrit : > Le lundi 22 août 2011 20:58:51, Torsten Becker a écrit : > > [1]: http://www.python.org/dev/peps/pep-0393 > > state: > lowest 2 bits (mask 0x03) - interned-state (SSTATE_*) as in 3.2 > next 2 bits (mask 0x0C

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 04:41, Torsten Becker a écrit : On Tue, Aug 23, 2011 at 10:08, Antoine Pitrou wrote: Macros are useful to shield the abstraction from the implementation. If you access the members directly, and the unicode object is represented differently in some future version of Python (say e.g

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 06:59, Scott Dial a écrit : On 8/23/2011 6:38 PM, Victor Stinner wrote: Le mardi 23 août 2011 00:14:40, Antoine Pitrou a écrit : - You could try to run stringbench, which can be found at http://svn.python.org/projects/sandbox/trunk/stringbench (*) and there's iobench

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 04:41, Torsten Becker a écrit : On Tue, Aug 23, 2011 at 18:27, Victor Stinner wrote: I posted a patch to re-add it: http://bugs.python.org/issue12819#msg142867 Thank you for the patch! Note that this patch adds the fast path only to the helper function which determines the

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 04:56, Torsten Becker a écrit : On Tue, Aug 23, 2011 at 18:56, Victor Stinner wrote: kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape(). If it can be removed, it would be nice to have ki

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 02:46, Terry Reedy a écrit : On 8/23/2011 9:21 AM, Victor Stinner wrote: Le 23/08/2011 15:06, "Martin v. Löwis" a écrit : Well, things have to be done in order: 1. the PEP needs to be approved 2. the performance bottlenecks need to be identified 3. optimizations should

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le 24/08/2011 11:22, Glenn Linderman a écrit : c) mostly ASCII (utf8) with clever indexing/caching to be efficient d) UTF-8 with clever indexing/caching to be efficient I see neither a need nor a means to consider these. The discussion about "mostly ASCII" strings seems convincing that there c

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-24 Thread Victor Stinner
Le mercredi 24 août 2011 20:52:51, Glenn Linderman a écrit : > Given the required variability of character size in all presently > Unicode defined encodings, I tend to agree with Tom that UTF-8, together > with some technique of translating character index to code unit offset, > may provide the bes

Re: [Python-Dev] PEP 393 review

2011-08-24 Thread Victor Stinner
> With this PEP, the unicode object overhead grows to 10 pointer-sized > words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine. > Does it have any adverse effects? For pure ASCII, it might be possible to use a shorter struct: typedef struct { PyObject_HEAD Py_ssize_t length

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-25 Thread Victor Stinner
Le 25/08/2011 06:12, Stephen J. Turnbull a écrit : > Let's take small steps. Do the evolutionary thing. Let's get things > right so users won't have to worry about code points vs. code units > any more. A conforming library for all things at the character level > can be developed late

Re: [Python-Dev] PEP 393 review

2011-08-25 Thread Victor Stinner
Le 25/08/2011 06:46, Stefan Behnel a écrit : Conversion to wchar_t* is common, especially on Windows. That's an issue. However, I cannot say how common this really is in practice. Surely depends on the specific code, right? How common is it in core CPython? Quite all functions taking text as

Re: [Python-Dev] PEP 393 Summer of Code Project

2011-08-26 Thread Victor Stinner
Le vendredi 26 août 2011 02:01:42, Dino Viehland a écrit : > The biggest difficulty for IronPython here would be dealing w/ .NET > interop. We can certainly introduce either an IronPython specific string > class which is similar to CPython's PyUnicodeObject or we could have > multiple distinct .NET

Re: [Python-Dev] Add from __experimental__ import bla [was: Should we move to replace re with regex?]

2011-08-27 Thread Victor Stinner
Le samedi 27 août 2011 21:57:26, Dj Gilcrease a écrit : > The idea of a __experimental__ area is good for any pep's or > stliib additions that are somewhat controversial (API isnt agreed on, > code may take a while to integrate properly, developer wants some time > to hash out any edge case bugs or

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le 29/08/2011 11:03, Dirkjan Ochtman a écrit : On Sun, Aug 28, 2011 at 21:47, "Martin v. Löwis" wrote: result strings. In PEP 393, a buffer must be scanned for the highest code point, which means that each byte must be inspected twice (a second time when the copying occurs). This may be

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le 28/08/2011 23:06, "Martin v. Löwis" a écrit : Am 28.08.2011 22:01, schrieb Antoine Pitrou: - the iobench results are between 2% acceleration (seek operations), 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-29 Thread Victor Stinner
Le lundi 29 août 2011 19:35:14, stefan brunthaler a écrit : > pretty much a year ago I wrote about the optimizations I did for my > PhD thesis that target the Python 3 series interpreters Does it speed up Python? :-) Could you provide numbers (benchmarks)? Victor

Re: [Python-Dev] PEP 393 review

2011-08-29 Thread Victor Stinner
Le lundi 29 août 2011 21:34:48, vous avez écrit : > >> Those haven't been ported to the new API, yet. Consider, for example, > >> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test; > >> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this > >> is a 25% speedup

Re: [Python-Dev] cpython: Issue #12567: Add curses.unget_wch() function

2011-09-06 Thread Victor Stinner
Le 06/09/2011 07:50, Antoine Pitrou a écrit : On Tue, 06 Sep 2011 01:53:32 +0200 victor.stinner wrote: http://hg.python.org/cpython/rev/b1e03d10391e changeset: 72297:b1e03d10391e user:Victor Stinner date:Tue Sep 06 01:53:03 2011 +0200 summary: Issue #12567: Add

Re: [Python-Dev] [Python-checkins] cpython (3.2): Fix PyUnicode_AsWideCharString() doc: size doesn't contain the null character

2011-09-06 Thread Victor Stinner
Le 06/09/2011 02:25, Nick Coghlan a écrit : On Tue, Sep 6, 2011 at 10:01 AM, victor.stinner wrote: Fix also spelling of the null character. While these cases are legitimately changed to 'null' (since they're lowercase descriptions of the character), I figure it's worth mentioning again that

Re: [Python-Dev] [Python-checkins] cpython: Issue #9561: packaging now writes egg-info files using UTF-8

2011-09-06 Thread Victor Stinner
Le 06/09/2011 17:17, Éric Araujo a écrit : Le 06/09/2011 00:11, victor.stinner a écrit : http://hg.python.org/cpython/rev/56ab3257ca13 changeset: 72296:56ab3257ca13 user:Victor Stinner date:Tue Sep 06 00:11:13 2011 +0200 summary: Issue #9561: packaging now writes egg-info

Re: [Python-Dev] PEP 393: Special-casing ASCII-only strings

2011-09-15 Thread Victor Stinner
Le jeudi 15 septembre 2011 17:50:41, Martin v. Löwis a écrit : > In reviewing memory usage, I found potential for saving more memory for > ASCII-only strings. (...) > > typedef struct { > PyObject_HEAD > Py_ssize_t length; > union { > void *any; > Py_UCS1 *latin1;

Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-26 Thread Victor Stinner
Hi, Le lundi 26 septembre 2011 23:00:06, Guido van Rossum a écrit : > So, if you have the time, please review PEP 393 and/or play with the > code (the repo is linked from the PEP's References section now). I played with the code. The full test suite pass on Linux, FreeBSD and Windows. On Windows

Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-27 Thread Victor Stinner
Le mardi 27 septembre 2011 00:19:02, Victor Stinner a écrit : > On Windows, there is just one failure in test_configparser, I > didn't investigate it yet Oh, it was a real bug in io.IncrementalNewlineDecoder. It is now fixed. Victor ___

Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-28 Thread Victor Stinner
> Resizing > > > Codecs use resizing a lot. Given that PyCompactUnicodeObject > does not support resizing, most decoders will have to use > PyUnicodeObject and thus not benefit from the memory footprint > advantages of e.g. PyASCIIObject. Wrong. Even if you create a string using the lega

Re: [Python-Dev] [Python-checkins] cpython: Enhance Py_ARRAY_LENGTH(): fail at build time if the argument is not an array

2011-09-28 Thread Victor Stinner
Le jeudi 29 septembre 2011 02:07:02, Benjamin Peterson a écrit : > 2011/9/28 victor.stinner : > > http://hg.python.org/cpython/rev/36fc514de7f0 > > changeset: 72512:36fc514de7f0 > > user: Victor Stinner > > date:Thu Sep 29 01:12:24 2011 +020

Re: [Python-Dev] Hg tips

2011-09-29 Thread Victor Stinner
nges to 5 files (+1 heads) not updating, since new heads added (run 'hg heads' to see heads, 'hg merge' to merge) # and use "hg heads ." to see the two heads (yours and the one you pulled) in the current branch $ hg heads . changeset: 72521:e6a2b54c1d16 tag: tip

Re: [Python-Dev] Hg tips

2011-09-29 Thread Victor Stinner
Le 29/09/2011 12:34, Xavier Morel a écrit : Generally none. By default, mercurial (and most similar tools) sets up LOCAL, BASE and OTHER. BASE is the... Sorry, but I'm unable to remember the meaning of LOCAL, BASE and OTHER. In meld, I have to scroll to the end of the filename so see the file

[Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-01 Thread Victor Stinner
Hi, Since the integration of the PEP 393, str += str is not more super-fast (but just fast). For example, adding a single character to a string has to copy all characters to a new string. I suppose that performances of a lot of applications manipulating text may be affected by this issue, espec

Re: [Python-Dev] cpython: Add _PyUnicode_UTF8() and _PyUnicode_UTF8_LENGTH() macros

2011-10-01 Thread Victor Stinner
Le samedi 1 octobre 2011 17:18:42, Antoine Pitrou a écrit : > On Sat, 01 Oct 2011 16:53:44 +0200 > > victor.stinner wrote: > > http://hg.python.org/cpython/rev/4afab01f5374 > > changeset: 72565:4afab01f5374 > > user:Victor Stinner > > date:

Re: [Python-Dev] [Python-checkins] cpython: =?utf-8?q?Enhance=09Py=5FARRAY=5FLENGTH?=(): fail at build time if the argument is not an array

2011-10-01 Thread Victor Stinner
Le samedi 1 octobre 2011 14:52:03, vous avez écrit : > >> Do we really need a new file? Why not pyport.h where other compiler > >> stuff goes? > > > > I'm not sure that pyport.h is the right place to add Py_MIN, Py_MAX, > > Py_ARRAY_LENGTH. pyport.h looks to be related to all things specific to >

Re: [Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-01 Thread Victor Stinner
> Since the integration of the PEP 393, str += str is not more super-fast > (but just fast). Oh oh. str+=str is now *1450x* slower than ''.join() pattern. Here is a benchmark (see attached script, bench_build_str.py): Python 3.3 str += str: 14548 ms ''.join() : 10 ms StringIO.write: 12

Re: [Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-02 Thread Victor Stinner
Le samedi 1 octobre 2011 22:21:01, Antoine Pitrou a écrit : > So, since people are confused at the number of possible options, you > propose to add a new option and therefore increase the confusion? The idea is to provide an API very close to the str type. So if your program becomes slow in some

Re: [Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-02 Thread Victor Stinner
Le dimanche 2 octobre 2011 15:25:21, Antoine Pitrou a écrit : > I don't know why you're saying that. The concatenation optimization > worked in 2.x where the "str" type also used only one memory block. You > just have to check that the refcount is about to drop to zero. > Of course, resizing only w

Re: [Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-03 Thread Victor Stinner
Le 03/10/2011 04:19, Victor Stinner a écrit : I restored this hack in Python 3.3 using PyUnicode_Append() in ceval.c and by optimizing PyUnicode_Append() (try to append in-place). str+=str is closer again to ''.join: str += str: 696 ms ''.join(): 547 ms I disabled tempor

Re: [Python-Dev] RFC: Add a new builtin strarray type to Python?

2011-10-03 Thread Victor Stinner
Le lundi 3 octobre 2011 18:04:57, vous avez écrit : > Why are you checking, in unicode_resizable, whether the string is from > unicode_latin1? If it is, then it should have a refcount of at least 2, > so the very first test in the function should already exclude it. There is also a test on unicode

Re: [Python-Dev] cpython: PyUnicode_FromKindAndData() raises a ValueError if the kind is unknown

2011-10-03 Thread Victor Stinner
> > -assert(0); > > +PyErr_SetString(PyExc_ValueError, "invalid kind"); > > > > return NULL; > > > > } > > Is that really a ValueError? It should only be a ValueError if the user > could trigger that error. Otherwise it should be a SystemError. You are right, ValueError is not be

Re: [Python-Dev] [Python-checkins] cpython: fix formatting

2011-10-04 Thread Victor Stinner
Le 04/10/2011 01:35, benjamin.peterson a écrit : http://hg.python.org/cpython/rev/64495ad8aa54 changeset: 72634:64495ad8aa54 user:Benjamin Peterson date:Mon Oct 03 19:35:07 2011 -0400 summary fix formatting +++ b/Objects/unicodeobject.c @@ -1362,8 +1362,8 @@ re

Re: [Python-Dev] [Python-checkins] cpython: fix compiler warnings

2011-10-04 Thread Victor Stinner
Le 04/10/2011 01:34, benjamin.peterson a écrit : http://hg.python.org/cpython/rev/afb60b190f1c changeset: 72633:afb60b190f1c user:Benjamin Peterson date:Mon Oct 03 19:34:12 2011 -0400 summary: fix compiler warnings +++ b/Objects/unicodeobject.c @@ -369,6 +369,12 @@ }

Re: [Python-Dev] [Python-checkins] cpython: pyexpat uses the new Unicode API

2011-10-04 Thread Victor Stinner
Le 03/10/2011 11:10, Amaury Forgeot d'Arc a écrit : changeset: 72548:a1be34457ccf user: Victor Stinner date:Sat Oct 01 01:05:40 2011 +0200 summary: pyexat uses the new Unicode API files: Modules/pyexpat.c | 12 +++- 1 files changed, 7 insertions(+), 5 dele

Re: [Python-Dev] cpython: PyUnicode_Join() checks output length in debug mode

2011-10-04 Thread Victor Stinner
Le 04/10/2011 23:41, Georg Brandl a écrit : I don't understand this change. Why would you not always add "copied" once you already have it? It seems to be the more correct version anyway. If you use copied instead of seplen/itemlen, you suppose that the string has been overallocated in some ca

Re: [Python-Dev] [Python-checkins] cpython: fix compiler warnings

2011-10-04 Thread Victor Stinner
Le 05/10/2011 00:30, Vlad Riscutia a écrit : Why does the function even return a value? As Benjamin said, it is just a bunch of asserts with return 1 at the end. It's just to be able to write assert(_PyUnicode_CheckConsistency(...)). assert() is just used to remove the instruction in release m

Re: [Python-Dev] [Python-checkins] cpython: Migrate str.expandtabs to the new API

2011-10-04 Thread Victor Stinner
Le 04/10/2011 18:45, "Martin v. Löwis" a écrit : Migrate str.expandtabs to the new API This needs if (PyUnicode_READY(self) == -1) return NULL; right after the ParseTuple call. In most cases, the check will be a noop. But if it's not, omitting it will make expandtabs have no effect, since t

Re: [Python-Dev] [Python-checkins] cpython: Optimize string slicing to use the new API

2011-10-04 Thread Victor Stinner
Le 04/10/2011 20:09, "Martin v. Löwis" a écrit : Am 04.10.11 19:50, schrieb Antoine Pitrou: On Tue, 04 Oct 2011 19:49:09 +0200 "Martin v. Löwis" wrote: + result = PyUnicode_New(slicelength, PyUnicode_MAX_CHAR_VALUE(self)); This is incorrect: the maxchar of the slice might be smaller than the

Re: [Python-Dev] [Python-checkins] cpython: Document requierements of Unicode kinds

2011-10-05 Thread Victor Stinner
Le mercredi 5 octobre 2011 21:25:22, Terry Reedy a écrit : > > + - PyUnicode_1BYTE_KIND (1): > > + > > + * character type = Py_UCS1 (8 bits, unsigned) > > + * if ascii is 1, at least one character must be in range > > + U+80-U+FF, otherwise all charac

[Python-Dev] New stringbench benchmark results

2011-10-05 Thread Victor Stinner
Hi, I optimized unicodeobject.c a little bit more where I saw major performance regressions from Python 3.2 to 3.3 using stringbench. Here are new results: see attachments. Example of tests where Python 3.3 is much slower: "A".join(["Bob"]*100)): 2.11 => 0.92 ("C"+"AB"*300).rfind("CA"): 0.57 =>

Re: [Python-Dev] New stringbench benchmark results

2011-10-06 Thread Victor Stinner
Hum, copy-paste failure, I wrote numbers in the wrong order, it's: (test: Python 3.2 => Python 3.3) "A".join(["Bob"]*100)): 0.92 => 2.11 ("C"+"AB"*300).rfind("CA"): 0.57 => 1.03 ("A" + ("Z"*128*1024)).replace("A", "BB", 1): 0.25 => 0.50 I improved str.replace(): it's now 5 times faster instead of

Re: [Python-Dev] Rename PyUnicode_KIND_SIZE ?

2011-10-06 Thread Victor Stinner
Le 06/10/2011 15:52, Antoine Pitrou a écrit : The PyUnicode_KIND_SIZE macro is defined as follows. Its name looks rather mysterious or misleading to me. Could we rename it to something else? What do you propose? also, is it useful? index << (kind - 1) and index * PyUnicode_CHARACTER_SIZE(st

Re: [Python-Dev] check for PyUnicode_READY look backwards

2011-10-07 Thread Victor Stinner
Le 07/10/2011 00:20, "Martin v. Löwis" a écrit : Am 06.10.11 14:57, schrieb Amaury Forgeot d'Arc: Hi, with the new Unicode API, there are many checks like: + if (PyUnicode_READY(*filename)) + goto handle_error; I think you are misinterpreting what you are seeing. There are not *many* such che

Re: [Python-Dev] check for PyUnicode_READY look backwards

2011-10-07 Thread Victor Stinner
Le 07/10/2011 10:07, Stefan Krah a écrit : Victor Stinner wrote: Yes, I wrote if (PyUnicode_READY(foo)), but I agree that it is confusing when you read the code, especially because we have also a PyUnicode_IS_READY(foo) macro! if (!PyUnicode_READY(foo)) is not better, also because of

Re: [Python-Dev] New stringbench benchmark results

2011-10-07 Thread Victor Stinner
Le 07/10/2011 03:19, Steven D'Aprano a écrit : Given that strings are immutable, would it not be an obvious optimization for replace to return the source string unchanged if the old and new substrings are equal, and avoid making a potentially expensive copy? I just implemented this optimization

Re: [Python-Dev] New stringbench benchmark results

2011-10-07 Thread Victor Stinner
Le 06/10/2011 12:42, Victor Stinner a écrit : "A".join(["Bob"]*100)): 0.92 => 2.11 I just optimized PyUnicode_Join() for such dummy benchmark. It's now 1.2x slower instead of 2.3x slower on this dummy benchmark. With longer *ASCII* strings, Python 3.3 is now

Re: [Python-Dev] PyUnicode_KIND changed

2011-10-07 Thread Victor Stinner
Le vendredi 7 octobre 2011 21:02:00, Martin v. Löwis a écrit : > After discussion with several people, I changed > PyUnicode_KIND to have values of 1,2,4, respectively, > thus reflecting the element size of the string numerically. You may rename it to "character size" (char_size) ;-) Victor _

Re: [Python-Dev] New stringbench benchmark results

2011-10-07 Thread Victor Stinner
Le jeudi 6 octobre 2011 02:06:30, Victor Stinner a écrit : > The rfind case is really strange: the code between Python 3.2 and 3.3 is > exactly the same. Even in Python 3.2: rfind looks twice faster than find: > > ("AB"*300+"C").find("BC") (*1000)

Re: [Python-Dev] [Python-ideas] PEP 3101 (Advanced string formatting) base 36 integer presentation type

2011-10-08 Thread Victor Stinner
Le 08/10/2011 15:03, Antoine Pitrou a écrit : On Fri, 07 Oct 2011 21:14:44 -0600 Jeffrey wrote: I would like to suggest adding an integer presentation type for base 36 to PEP 3101. I can't imagine that it would be a whole lot more difficult than the existing types. Python's built-in long inte

Re: [Python-Dev] [Python-ideas] PEP 3101 (Advanced string formatting) base 36 integer presentation type

2011-10-09 Thread Victor Stinner
Le 08/10/2011 17:14, Victor Stinner a écrit : Le 08/10/2011 15:03, Antoine Pitrou a écrit : On Fri, 07 Oct 2011 21:14:44 -0600 Jeffrey wrote: I would like to suggest adding an integer presentation type for base 36 to PEP 3101. I can't imagine that it would be a whole lot more difficult

Re: [Python-Dev] [Python-checkins] cpython (2.7): Issue #13156: revert changeset f6feed6ec3f9, which was only relevant for native

2011-10-12 Thread Victor Stinner
Le mercredi 12 octobre 2011 21:07:33, charles-francois.natali a écrit : > changeset: 72897:ee4fe16d9b48 > branch: 2.7 > parent: 69635:f6feed6ec3f9 > user:Charles-François Natali > date:Wed Oct 12 21:07:54 2011 +0200 > summary: > Issue #13156: revert changeset f6feed6e

Re: [Python-Dev] Identifier API

2011-10-12 Thread Victor Stinner
Le samedi 8 octobre 2011 16:54:06, Martin v. Löwis a écrit : > In benchmarking PEP 393, I noticed that many UTF-8 decode > calls originate from C code with static strings, in particular > PyObject_CallMethod. Many of such calls already have been optimized > to cache a string object, however, PyObje

Re: [Python-Dev] cpython: Optimize findchar() for PyUnicode_1BYTE_KIND: use memchr and memrchr

2011-10-12 Thread Victor Stinner
Le jeudi 13 octobre 2011 01:27:32, Antoine Pitrou a écrit : > On Thu, 13 Oct 2011 01:17:29 +0200 > > victor.stinner wrote: > > http://hg.python.org/cpython/rev/e5bd48b43a58 > > changeset: 72903:e5bd48b43a58 > > user:Victor Stinner > > date:

Re: [Python-Dev] Identifier API

2011-10-12 Thread Victor Stinner
Le jeudi 13 octobre 2011 00:44:33, Victor Stinner a écrit : > Le samedi 8 octobre 2011 16:54:06, Martin v. Löwis a écrit : > > In benchmarking PEP 393, I noticed that many UTF-8 decode > > calls originate from C code with static strings, in particular > > PyObject_CallMetho

Re: [Python-Dev] Identifier API

2011-10-13 Thread Victor Stinner
Le jeudi 13 octobre 2011 03:34:00, Victor Stinner a écrit : > > We would need a new format for Py_BuildValue, e.g. 'a' for ASCII string. > > Later we can add new functions like _PyDict_GetASCII(). > > The main difference between my new "const ASCII" string i

Re: [Python-Dev] Identifier API

2011-10-14 Thread Victor Stinner
Le 14/10/2011 07:44, Georg Brandl a écrit : Am 14.10.2011 00:30, schrieb Victor Stinner: Le jeudi 13 octobre 2011 03:34:00, Victor Stinner a écrit : We would need a new format for Py_BuildValue, e.g. 'a' for ASCII string. Later we can add new functions like _PyDict_GetASCII().

[Python-Dev] Modules of plat-* directories

2011-10-16 Thread Victor Stinner
Hi, I don't understand why we kept modules of the plat-* directories (e.g. Lib/plat-linux/CDROM.py). It looks like these modules are not used, except maybe some DL constants used by PyKDE4. Can't we move used constants to classic Python modules (e.g. the os module) and drop unused modules? The

Re: [Python-Dev] Modules of plat-* directories

2011-10-16 Thread Victor Stinner
Le lundi 17 octobre 2011 01:16:36, Victor Stinner a écrit : > For example, IN.INT_MAX is 2147483647, whereas it should > be 9223372036854775807 on my 64-bit Linux. Oops, wrong example: INT_MAX is also 2147483647 on 64 bits. I mean IN.LONG_MAX. IN.LONG_MAX is always 9223372036854775807 on

Re: [Python-Dev] Modules of plat-* directories

2011-10-17 Thread Victor Stinner
Le lundi 17 octobre 2011 23:27:09, Antoine Pitrou a écrit : > On Mon, 17 Oct 2011 02:04:38 +0200 > > Victor Stinner wrote: > > Le lundi 17 octobre 2011 01:16:36, Victor Stinner a écrit : > > > For example, IN.INT_MAX is 2147483647, whereas it should > > > be

Re: [Python-Dev] Status of the PEP 400? (deprecate codecs.StreamReader/StreamWriter)

2011-10-21 Thread Victor Stinner
Le vendredi 29 juillet 2011 19:01:06, Guido van Rossum a écrit : > On Fri, Jul 29, 2011 at 8:37 AM, Nick Coghlan wrote: > > On Sat, Jul 30, 2011 at 1:17 AM, Antoine Pitrou wrote: > >> On Thu, 28 Jul 2011 11:28:43 +0200 > >> > >> Victor Stinner wrote: >

Re: [Python-Dev] [PATCH] unicode subtypes broken in latest py3k debug builds

2011-10-22 Thread Victor Stinner
the py3k debug build has been broken in Cython's integration tests for a couple of weeks now due to a use-after-decref bug. Here's the fix, please apply. Oops, I introduced this bug when I added "check_content" option to _PyUnicode_CheckUnicode(). BTW, is there a reason unicode_subtype_new()

Re: [Python-Dev] Modules of plat-* directories

2011-10-24 Thread Victor Stinner
There are open issues related to plat-XXX. Le Lundi 24 Octobre 2011 00:03:42 Martin v. Löwis a écrit : > no, we make no changes to them unless a user actually requests a change Matthias Klose asked for socket SIO* constants in september 2006 (5 years ago). http://bugs.python.org/issue1565071 I

[Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

2011-10-24 Thread Victor Stinner
Hi, I propose to raise Unicode errors if a filename cannot be decoded on Windows, instead of creating a bogus filenames with questions marks. Because this change is incompatible with Python 3.2, even if such filenames are unusable and I consider the problem as a (Python?) bug, I would like your

Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

2011-10-25 Thread Victor Stinner
Le Mardi 25 Octobre 2011 13:20:12 vous avez écrit : > Victor Stinner writes: > > I propose to raise Unicode errors if a filename cannot be decoded > > on Windows, instead of creating a bogus filenames with questions > > marks. > > By "bogus" you mean &

Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

2011-10-25 Thread Victor Stinner
Le Mardi 25 Octobre 2011 09:09:56 vous avez écrit : > > I propose to raise Unicode errors if a filename cannot be decoded on > > Windows, instead of creating a bogus filenames with questions marks. > > Can you please elaborate what APIs you are talking about exactly? Basically, all functions proc

Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

2011-10-25 Thread Victor Stinner
Le Mardi 25 Octobre 2011 09:09:56 vous avez écrit : > If it's the byte APIs (i.e. using bytes as file names), then I'm -1 on > this proposal. People that explicitly use bytes for file names deserve > to get whatever exact platform semantics the platform has to offer. This > is true on Unix, and it

Re: [Python-Dev] memcmp performance

2011-10-25 Thread Victor Stinner
Le Mardi 25 Octobre 2011 10:44:16 Stefan Behnel a écrit : > Richard Saunders, 25.10.2011 01:17: > > -On [20111024 09:22], Stefan Behnel wrote: > > >>I agree. Given that the analysis shows that the libc memcmp() is > > >>particularly fast on many Linux systems, it should be up to the > > >>Pyt

Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

2011-10-25 Thread Victor Stinner
Le mardi 25 octobre 2011 00:57:42, Victor Stinner a écrit : > I propose to raise Unicode errors if a filename cannot be decoded on > Windows, instead of creating a bogus filenames with questions marks. > Because this change is incompatible with Python 3.2, even if such > filenames are

Re: [Python-Dev] [Python-checkins] cpython: Issue #13226: Add RTLD_xxx constants to the os module. These constants can by

2011-10-25 Thread Victor Stinner
Le mardi 25 octobre 2011 14:50:44, Petri Lehtinen a écrit : > Hi, > > victor.stinner wrote: > > http://hg.python.org/cpython/rev/c75427c0da06 > > changeset: 73127:c75427c0da06 > > user:Victor Stinner > > date:Tue Oct 25 13:34:04 2011 +0200 >

Re: [Python-Dev] Use our strict mbcs codec instead of the Windows ANSI API

2011-10-26 Thread Victor Stinner
Le Mardi 25 Octobre 2011 10:31:56 Victor Stinner a écrit : >> Basically, all functions processing filenames, so most functions of >> posixmodule.c. Some examples: >> >> - os.listdir(): FindFirstFileA, FindNextFileA, FindCloseA >> - os.lstat(): CreateF

[Python-Dev] Emit a BytesWarning on bytes filenames on Windows

2011-10-28 Thread Victor Stinner
Hi, I am not more conviced that raising a UnicodeEncodeError on unencodable characters is the right fix for the issue #13247. The problem with this solution is that you have to wait until an user get a UnicodeEncodeError. I have yet another proposition: emit a warning when a bytes filename is u

Re: [Python-Dev] Emit a BytesWarning on bytes filenames on Windows

2011-10-30 Thread Victor Stinner
Le 30/10/2011 09:00, "Martin v. Löwis" a écrit : As quoted above, deprecation of the bytes version of the API sounds fine to me, but isn't this going to run into the usual objections from the "we need bytes for efficiency" crowd? It's OK with me to say "in this restricted area you must convert

Re: [Python-Dev] Emit a BytesWarning on bytes filenames on Windows

2011-10-30 Thread Victor Stinner
Le 29/10/2011 07:47, Mark Hammond a écrit : When previously discussing this issue, I was under the impression that the problem was unencodable bytes passed from the Python code to Windows - but the reverse is true - only the data coming back from Windows isn't encodable. The undecodable filenam

Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Victor Stinner
Le Mercredi 2 Novembre 2011 19:32:38 Derek Shockey a écrit : > I just found an unexpected behavior and I'm wondering if it is a bug. > In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it > appears that integers are not correctly overflowing into longs and > instead are yielding bi

Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Victor Stinner
Le jeudi 3 novembre 2011 18:14:42, mar...@v.loewis.de a écrit : > There is a backwards compatibility issue with PEP 393 and Unicode > exceptions: the start and end indices: are they Py_UNICODE indices, or > code point indices? Oh oh. That's exactly why I didn't want to start to work on this issue.

Re: [Python-Dev] [Python-checkins] cpython: Port code page codec to Unicode API.

2011-11-04 Thread Victor Stinner
Le vendredi 4 novembre 2011 18:23:26, martin.v.loewis a écrit : > http://hg.python.org/cpython/rev/9191f804d376 > changeset: 73353:9191f804d376 > parent: 73351:2bec7c452b39 > user:Martin v. Löwis > date:Fri Nov 04 18:23:06 2011 +0100 > summary: > Port code page codec to Un

[Python-Dev] PyDict_Get/SetItem and dict subclasses

2011-11-05 Thread Victor Stinner
Hi, PyDict_GetItem() and PyDict_SetItem() don't call __getitem__ and __setitem__ for dict subclasses. Is there a reason for that? I found this surprising behaviour when I replaced a dict by a custom dict checking the key type on set. But my __setitem__ was not called because the function using

<    23   24   25   26   27   28   29   30   31   32   >