[Python-Dev] PyPy 1.7 - widening the sweet spot

2011-11-21 Thread Maciej Fijalkowski
==
PyPy 1.7 - widening the sweet spot
==

We're pleased to announce the 1.7 release of PyPy. As became a habit, this
release brings a lot of bugfixes and performance improvements over the 1.6
release. However, unlike the previous releases, the focus has been on widening
the "sweet spot" of PyPy. That is, classes of Python code that PyPy can greatly
speed up should be vastly improved with this release. You can download the 1.7
release here:

   http://pypy.org/download.html

What is PyPy?
=

PyPy is a very compliant Python interpreter, almost a drop-in replacement for
CPython 2.7. It's fast (`pypy 1.7 and cpython 2.7.1`_ performance comparison)
due to its integrated tracing JIT compiler.

This release supports x86 machines running Linux 32/64, Mac OS X 32/64 or
Windows 32. Windows 64 work is ongoing, but not yet natively supported.

The main topic of this release is widening the range of code which PyPy
can greatly speed up. On average on
our benchmark suite, PyPy 1.7 is around **30%** faster than PyPy 1.6 and up
to **20 times** faster on some benchmarks.

.. _`pypy 1.7 and cpython 2.7.1`: http://speed.pypy.org


Highlights
==

* Numerous performance improvements. There are too many examples which python
 constructs now should behave faster to list them.

* Bugfixes and compatibility fixes with CPython.

* Windows fixes.

* PyPy now comes with stackless features enabled by default. However,
 any loop using stackless features will interrupt the JIT for now, so no real
 performance improvement for stackless-based programs. Contact pypy-dev for
 info how to help on removing this restriction.

* NumPy effort in PyPy was renamed numpypy. In order to try using it, simply
 write::

   import numpypy as numpy

 at the beginning of your program. There is a huge progress on numpy in PyPy
 since 1.6, the main feature being implementation of dtypes.

* JSON encoder (but not decoder) has been replaced with a new one. This one
 is written in pure Python, but is known to outperform CPython's C extension
 up to **2 times** in some cases. It's about **20 times** faster than
 the one that we had in 1.6.

* The memory footprint of some of our RPython modules has been drastically
 improved. This should impact any applications using for example cryptography,
 like tornado.

* There was some progress in exposing even more CPython C API via cpyext.

Things that didn't make it, expect in 1.8 soon
==

There is an ongoing work, which while didn't make it to the release, is
probably worth mentioning here. This is what you should probably expect in
1.8 some time soon:

* Specialized list implementation. There is a branch that implements lists of
 integers/floats/strings as compactly as array.array. This should drastically
 improve performance/memory impact of some applications

* NumPy effort is progressing forward, with multi-dimensional arrays coming
 soon.

* There are two brand new JIT assembler backends, notably for the PowerPC and
 ARM processors.

Fundraising
===

It's maybe worth mentioning that we're running fundraising campaigns for
NumPy effort in PyPy and for Python 3 in PyPy. In case you want to see any
of those happen faster, we urge you to donate to `numpy proposal`_ or
`py3k proposal`_. In case you want PyPy to progress, but you trust us with
the general direction, you can always donate to the `general pot`_.

.. _`numpy proposal`: http://pypy.org/numpydonate.html
.. _`py3k proposal`: http://pypy.org/py3donate.html
.. _`general pot`: http://pypy.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Chose a name for a "get unicode as wide character, borrowed reference" function

2011-11-21 Thread Victor Stinner
Hi,

With the PEP 393, the Py_UNICODE is now deprecated and scheduled for removal 
in Python 4. PyUnicode_AsUnicode() and PyUnicode_AsUnicodeAndSize() functions 
are still commonly used on Windows to get the string as wchar_t* without 
having to care of freeing the memory: it's a borrowed reference (pointer).

I would like to add a new PyUnicode_AsWideChar() function which would return 
the borrowed reference, exactly as PyUnicode_AsUnicode(). The problem is that 
"PyUnicode_AsWideChar" already exists in Python 3.2, as 
PyUnicode_AsWideCharString.

Do you have an suggestion for a name of such function?

PyUnicode_AsWideCharBorrowed?
PyUnicode_AsFastWideChar?
PyUnicode_ToWideChar?
PyUnicode_AsWchar_t?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Chose a name for a "get unicode as wide character, borrowed reference" function

2011-11-21 Thread Antoine Pitrou
On Mon, 21 Nov 2011 12:53:17 +0100
Victor Stinner  wrote:
> 
> I would like to add a new PyUnicode_AsWideChar() function which would return 
> the borrowed reference, exactly as PyUnicode_AsUnicode(). The problem is that 
> "PyUnicode_AsWideChar" already exists in Python 3.2, as 
> PyUnicode_AsWideCharString.

This is not very clear. You are proposing to add a function which
already exists, except that you have to free the pointer yourself?
I don't think that's a good idea, the API is already large enough.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Chose a name for a "get unicode as wide character, borrowed reference" function

2011-11-21 Thread Victor Stinner
Le Lundi 21 Novembre 2011 16:04:06 Antoine Pitrou a écrit :
> On Mon, 21 Nov 2011 12:53:17 +0100
> 
> Victor Stinner  wrote:
> > I would like to add a new PyUnicode_AsWideChar() function which would
> > return the borrowed reference, exactly as PyUnicode_AsUnicode(). The
> > problem is that "PyUnicode_AsWideChar" already exists in Python 3.2, as
> > PyUnicode_AsWideCharString.
> 
> This is not very clear. You are proposing to add a function which
> already exists, except that you have to free the pointer yourself?
> I don't think that's a good idea, the API is already large enough.

I want to rename PyUnicode_AsUnicode() and change its result type (Py_UNICODE* 
=> wchar_t*). The result will be a "borrowed reference", ie. you don't have to 
free the memory, it will be done when the Unicode string will be destroyed (by 
Py_DECREF).

The problem is that Py_UNICODE type is now deprecated.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Chose a name for a "get unicode as wide character, borrowed reference" function

2011-11-21 Thread Antoine Pitrou
On Mon, 21 Nov 2011 16:53:10 +0100
Victor Stinner  wrote:
> Le Lundi 21 Novembre 2011 16:04:06 Antoine Pitrou a écrit :
> > On Mon, 21 Nov 2011 12:53:17 +0100
> > 
> > Victor Stinner  wrote:
> > > I would like to add a new PyUnicode_AsWideChar() function which would
> > > return the borrowed reference, exactly as PyUnicode_AsUnicode(). The
> > > problem is that "PyUnicode_AsWideChar" already exists in Python 3.2, as
> > > PyUnicode_AsWideCharString.
> > 
> > This is not very clear. You are proposing to add a function which
> > already exists, except that you have to free the pointer yourself?
> > I don't think that's a good idea, the API is already large enough.
> 
> I want to rename PyUnicode_AsUnicode() and change its result type 
> (Py_UNICODE* 
> => wchar_t*). The result will be a "borrowed reference", ie. you don't have 
> to 
> free the memory, it will be done when the Unicode string will be destroyed 
> (by 
> Py_DECREF).

But this is almost the same as PyUnicode_AsWideCharString, right?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch metadata - to use or not to use?

2011-11-21 Thread Éric Araujo
Hi,

> I recently got some patches accepted for inclusion in 3.3, and each time, 
> the patch metadata (such as my name and my commit comment) were stripped by 
> applying the patch manually, instead of hg importing it. This makes it 
> clear in the history who eventually reviewed and applied the patch, but 
> less visible who wrote it (except for the entry in Misc/NEWS).

We had a similar discussion on python-committers a while back, and the
gist of the replies was that there is no such thing as a patch ready for
commit, i.e. the core dev always edits something.  As Antoine said,
we’ve switched to Mercurial to ease contributions, but we still work
with patches, not directly with changesets.  That said, I remember that
once I got a patch that was complete, and I just used hg import and hg
push since it was so easy.  I share the opinion that putting
contributors’ names in the spotlight is a good way to encourage them.

Cheers
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Chose a name for a "get unicode as wide character, borrowed reference" function

2011-11-21 Thread Victor Stinner
Le Lundi 21 Novembre 2011 16:55:05 Antoine Pitrou a écrit :
> > I want to rename PyUnicode_AsUnicode() and change its result type
> > (Py_UNICODE* => wchar_t*). The result will be a "borrowed reference",
> > ie. you don't have to free the memory, it will be done when the Unicode
> > string will be destroyed (by Py_DECREF).
> 
> But this is almost the same as PyUnicode_AsWideCharString, right?

You have to free the memory for PyUnicode_AsWideCharString().

With PyUnicode_AsWideCharXXX(), as PyUnicode_AsUnicode(), you don't have to. 
The memory is handled by the Unicode object.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Chose a name for a "get unicode as wide character, borrowed reference" function

2011-11-21 Thread Antoine Pitrou
On Mon, 21 Nov 2011 18:02:36 +0100
Victor Stinner  wrote:
> Le Lundi 21 Novembre 2011 16:55:05 Antoine Pitrou a écrit :
> > > I want to rename PyUnicode_AsUnicode() and change its result type
> > > (Py_UNICODE* => wchar_t*). The result will be a "borrowed reference",
> > > ie. you don't have to free the memory, it will be done when the Unicode
> > > string will be destroyed (by Py_DECREF).
> > 
> > But this is almost the same as PyUnicode_AsWideCharString, right?
> 
> You have to free the memory for PyUnicode_AsWideCharString().

That's why I said "almost".

I don't think it's a good idea to add this function, for two reasons:

- the unicode API is already big enough, we don't need redundant
  functions with differing refcount behaviours

- the internal wchar_t representation is certainly meant to disappear
  in the long term; adding an API which *relies* on that representation
  is silly, especially after we deliberately deprecated the Py_UNICODE
  APIs

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python 3, new-style classes and __class__

2011-11-21 Thread Michael Foord

On 20/11/2011 21:41, Guido van Rossum wrote:

On Sun, Nov 20, 2011 at 10:44 AM, Michael Foord
  wrote:

On 20 Nov 2011, at 16:35, Guido van Rossum wrote:


Um, what?! __class__ *already* has a special meaning. Those examples
violate that meaning. No wonder they get garbage results.

The correct way to override isinstance is explained here:
http://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass
.



Proxy classes have been using __class__ as a descriptor for this purpose for 
years before ABCs were introduced. This worked fine up until Python 3 where the 
compiler magic broke it when super is used. That is now fixed anyway.

Hm, okay. Though it's disheartening that it took three releases of 3.x
to figure this out. And there was a PEP even!


If I understand correctly, ABCs are great for allowing classes of objects to 
pass isinstance checks (etc) - what proxy, lazy and mock objects need is to be 
able to allow individual instances to pass different isinstance checks.

Ah, oops. Yes, __instancecheck__ is for the class to override
isinstance(inst, cls); for the *instance* to override apparently
you'll need to mess with __class__.

I guess my request at this point would be to replace '@__class__' with
some other *legal* __identifier__ that doesn't clash with existing use
-- I don't like the arbitrary use of @ here.
  


The problem with using a valid identifier name is that it leaves open 
the possibility of the same "broken" behaviour (removing from the class 
namespace) for whatever name we pick.


That means we should document the name used - and it's then more likely 
that users will start to rely on this odd (but documented) internal 
implementation detail. This in turn puts a burden on other 
implementations to use the same mechanism, even if this is less than 
ideal for them.


This is why a deliberately invalid identifier was picked.

All the best,

Michael Foord


--Guido


All the best,

Michael Foord


--Guido

On Sat, Nov 19, 2011 at 6:13 PM, Michael Foord
  wrote:


On 19 November 2011 23:11, Vinay Sajip  wrote:

Michael Foord  voidspace.org.uk>  writes:


That works fine in Python 3 (mock.Mock does it):

  >>>  class Foo(object):
...  @property
...  def __class__(self):
...   return int
...
  >>>  a = Foo()
  >>>  isinstance(a, int)
True
  >>>  a.__class__


There must be something else going on here.


Michael, thanks for the quick response. Okay, I'll dig in a bit further:
the
definition in SimpleLazyObject is

__class__ = property(new_method_proxy(operator.attrgetter("__class__")))

so perhaps the problem is something related to the specifics of the
definition.
Here's what I found in initial exploration:

--
Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from django.utils.functional import SimpleLazyObject
fake_bool = SimpleLazyObject(lambda: True)
fake_bool.__class__



fake_bool.__dict__

{'_setupfunc':  at 0xca9ed8>, '_wrapped': True}

SimpleLazyObject.__dict__

dict_proxy({
'__module__': 'django.utils.functional',
'__nonzero__':,
'__deepcopy__':,
'__str__':,
'_setup':,
'__class__':,
'__hash__':,
'__unicode__':,
'__bool__':,
'__eq__':,
'__doc__': '\n A lazy object initialised from any function.\n\n
Designed for compound objects of unknown type. For builtins or
objects of\n known type, use django.utils.functional.lazy.\n ',
'__init__':
})
--
Python 3.2.2 (default, Sep 5 2011, 21:17:14)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from django.utils.functional import SimpleLazyObject
fake_bool = SimpleLazyObject(lambda : True)
fake_bool.__class__



fake_bool.__dict__

{
'_setupfunc':  at 0x1c36ea8>,
'_wrapped':
}

SimpleLazyObject.__dict__

dict_proxy({
'__module__': 'django.utils.functional',
'__nonzero__':,
'__deepcopy__':,
'__str__':,
'_setup':,
'__hash__':,
'__unicode__':,
'__bool__':,
'__eq__':,
'__doc__': '\n A lazy object initialised from any function.\n\n
Designed for compound objects of unknown type. For builtins or
objects of\n known type, use django.utils.functional.lazy.\n ',
'__init__':
})
--

In Python 3, there's no __class__ property as there is in Python 2,
the fake_bool's type isn't bool, and the callable to set up the wrapped
object never gets called (which is why _wrapped is not set to True, but to
an anonymous object - this is set in SimpleLazyObject.__init__).


The Python compiler can do strange things with assignment to __class__ in
the presence of super. This issue has now been fixed, but it may be what is
biting you:

 http://bugs.python.org/issue12370

If th

Re: [Python-Dev] Python 3, new-style classes and __class__

2011-11-21 Thread Guido van Rossum
On Mon, Nov 21, 2011 at 9:22 AM, Michael Foord
 wrote:
> On 20/11/2011 21:41, Guido van Rossum wrote:
>>
>> On Sun, Nov 20, 2011 at 10:44 AM, Michael Foord
>>   wrote:
>>>
>>> On 20 Nov 2011, at 16:35, Guido van Rossum wrote:
>>>
 Um, what?! __class__ *already* has a special meaning. Those examples
 violate that meaning. No wonder they get garbage results.

 The correct way to override isinstance is explained here:

 http://www.python.org/dev/peps/pep-3119/#overloading-isinstance-and-issubclass
 .

>>>
>>> Proxy classes have been using __class__ as a descriptor for this purpose
>>> for years before ABCs were introduced. This worked fine up until Python 3
>>> where the compiler magic broke it when super is used. That is now fixed
>>> anyway.
>>
>> Hm, okay. Though it's disheartening that it took three releases of 3.x
>> to figure this out. And there was a PEP even!
>>
>>> If I understand correctly, ABCs are great for allowing classes of objects
>>> to pass isinstance checks (etc) - what proxy, lazy and mock objects need is
>>> to be able to allow individual instances to pass different isinstance
>>> checks.
>>
>> Ah, oops. Yes, __instancecheck__ is for the class to override
>> isinstance(inst, cls); for the *instance* to override apparently
>> you'll need to mess with __class__.
>>
>> I guess my request at this point would be to replace '@__class__' with
>> some other *legal* __identifier__ that doesn't clash with existing use
>> -- I don't like the arbitrary use of @ here.
>>
>
> The problem with using a valid identifier name is that it leaves open the
> possibility of the same "broken" behaviour (removing from the class
> namespace) for whatever name we pick.
>
> That means we should document the name used - and it's then more likely that
> users will start to rely on this odd (but documented) internal
> implementation detail. This in turn puts a burden on other implementations
> to use the same mechanism, even if this is less than ideal for them.
>
> This is why a deliberately invalid identifier was picked.

Hm. There are many, many places in Python where a __special__
identifier is used in such a way that a user who stomps on it can
cause themselves pain. This is why the language reference is quite
serious about reserving *all* __special__ names and states that only
documented uses of them are allowed (and at least implying that
undocumented uses are not necessarily flagged as errors).

While I see that PEP 3119 made a mistake in giving __class__ two
different, incompatible special uses, I don't agree that this case is
so special that we should use an "invalid" identifier.  I don't see
that the name use should actually be documented -- users should not
make *any* use of undocumented __names__. Let's please continue the
tradition of allowing experts to mess around creatively with
internals.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Committing PEP 3155

2011-11-21 Thread Guido van Rossum
I've approved the latest version of this PEP. Congrats, Antoine!

--Guido

On Fri, Nov 18, 2011 at 12:14 PM, Antoine Pitrou  wrote:
>
> Hello,
>
> I haven't seen any strong objections, so I would like to go ahead and
> commit PEP 3155 (*) soon. Is anyone against it?
>
> (*) "Qualified name for classes and functions"
>    http://www.python.org/dev/peps/pep-3155/
>
> Thank you
>
> Antoine.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PyUnicode_EncodeDecimal

2011-11-21 Thread Victor Stinner
Hi,

I'm trying to rewrite PyUnicode_EncodeDecimal() to upgrade it to the new 
Unicode API. The problem is that the function is not accessible in Python nor 
tested. Should we document and test it, leave it unchanged and deprecate it, 
or simply remove it?

--

Python has a PyUnicode_EncodeDecimal() function. It was used in Python 2 by 
int, long and complex constructors. In Python 3, the function is no more used: 
it has been replaced by PyUnicode_TransformDecimalToASCII() in Python <= 3.2 
and _PyUnicode_TransformDecimalAndSpaceToASCII() in Python 3.3.

PyUnicode_EncodeDecimal() goes into an unlimited loop if there is more than 
one unencodable character. It's a known bug and there is a patch:
http://bugs.python.org/issue13093

PyUnicode_EncodeDecimal() is undocumented and not tested:
http://bugs.python.org/issue8646

Stefan Krah uses PyUnicode_EncodeDecimal() in its cdecimal project.

See also "Malformed error message from float()" issue:
http://bugs.python.org/issue10557

Python 3.3 has now 3 encoders to decimal:
 - PyUnicode_EncodeDecimal()
 - PyUnicode_TransformDecimalToASCII()
 - _PyUnicode_TransformDecimalAndSpaceToASCII() (new in 3.3)

_PyUnicode_TransformDecimalAndSpaceToASCII() replaces also Unicode spaces with 
ASCII spaces. PyUnicode_EncodeDecimal() and 
PyUnicode_TransformDecimalToASCII() take Py_UNICODE* strings.

PyUnicode_EncodeDecimal() requires an output buffer and it has no argument for 
the size of the output buffer. It is unsafe: it leads to buffer overflow if the 
buffer is too small.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PyPy 1.7 - widening the sweet spot

2011-11-21 Thread Terry Reedy

On 11/21/2011 5:36 AM, Maciej Fijalkowski wrote:

==
PyPy 1.7 - widening the sweet spot
==

We're pleased to announce the 1.7 release of PyPy. As became a habit, this
release brings a lot of bugfixes and performance improvements over the 1.6
release. However, unlike the previous releases, the focus has been on widening
the "sweet spot" of PyPy. That is, classes of Python code that PyPy can greatly
speed up should be vastly improved with this release. You can download the 1.7
release here:

http://pypy.org/download.html

...

The main topic of this release is widening the range of code which PyPy
can greatly speed up. On average on
our benchmark suite, PyPy 1.7 is around **30%** faster than PyPy 1.6 and up
to **20 times** faster on some benchmarks.

.. _`pypy 1.7 and cpython 2.7.1`: http://speed.pypy.org


If I understand right, pypy is generally slower than cpython without jit 
and faster with jit. (There is obviously a spurious datapoint in the 
pypy-c timeline for raytracing-simple.) This site is a nice piece of work.

...

.. _`py3k proposal`: http://pypy.org/py3donate.html


I strongly recommend that where it makes a difference, the pypy python3 
project target 3.3. In particular, don't reproduce the buggy 
narrow-build behavior of 3.2 and before (perhaps pypy avoids this 
already). Do include the new unicode capi in cpyext. I anticipate that 
3.3 will see more production use than 3.2


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PyPy 1.7 - widening the sweet spot

2011-11-21 Thread Amaury Forgeot d'Arc
2011/11/21 Terry Reedy 

> I strongly recommend that where it makes a difference, the pypy python3
> project target 3.3. In particular, don't reproduce the buggy narrow-build
> behavior of 3.2 and before (perhaps pypy avoids this already). Do include
> the new unicode capi in cpyext. I anticipate that 3.3 will see more
> production use than 3.2
>

In the current 2.7-compatible version, PyPy already uses wchar_t for its
Unicode string, i.e. it is always a wide build with gcc and a narrow build
on Windows.

But this will certainly change for the 3.x port. PyPy already supports
different internal representations for the same visible user type, and it
makes sense to have 1-byte, 2-bytes and 4-bytes unicode types and try to
choose the most efficient representation.

As for the C API... getting a pointer out of a PyPy string already requires
to allocate and fill a new non-movable buffer (since all memory allocated
by PyPy is movable).  So cpyext could support the new API for sure, but
it's unlikely to give any performance benefit to an extension module.

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PyUnicode_EncodeDecimal

2011-11-21 Thread Victor Stinner
Le lundi 21 novembre 2011 21:39:53, Victor Stinner a écrit :
> I'm trying to rewrite PyUnicode_EncodeDecimal() to upgrade it to the new
> Unicode API. The problem is that the function is not accessible in Python
> nor tested.

I added tests for this function in Python 2.7, 3.2 and 3.3.

> PyUnicode_EncodeDecimal() goes into an unlimited loop if there is more than
> one unencodable character. It's a known bug and there is a patch:
> http://bugs.python.org/issue13093

I fixed this issue. I was wrong: it was not possible to DoS Python, the bug was 
not an unlimited loop (but there was a bug on error handling).

> PyUnicode_EncodeDecimal() requires an output buffer and it has no argument
> for the size of the output buffer. It is unsafe: it leads to buffer
> overflow if the buffer is too small.

This function is broken by design if an error handler is specified: the caller 
cannot know the size of the output buffer, whereas the caller has to allocate 
this buffer.

I propose to raise an error if an error handler (different than "strict") is 
specified) and do this change in Python 2.7, 3.2 and 3.3.

In Python 2.7 code base, PyUnicode_EncodeDecimal() is always called with 
errors=NULL. In Python 3.x, the function is no more called.

> Should we document and test it, leave it unchanged and
> deprecate it, or simply remove it?

If we change PyUnicode_EncodeDecimal() to reject error handlers different than 
strict, we can keep this function for some release and deprecate it. The 
function is already deprecated beacuse it uses the deprecated Py_UNICODE type.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PyUnicode_Resize

2011-11-21 Thread Victor Stinner
Hi,

In Python 3.2, PyUnicode_Resize() expects a number of Py_UNICODE units, 
whereas Python 3.3 expects a number of characters.

It is tricky to convert a number of Py_UNICODE units to a number of 
characters, so it is diffcult to provide a backward compatibility 
PyUnicode_Resize() function taking a number of Py_UNICODE units in Python 3.3.

Should we rename PyUnicode_Resize() in Python 3.3 to avoid surprising bugs?

The issue only concerns Windows with non-BMP characters, so a very rare use 
case.

The easiest solution is to do nothing in Python 3.3: the API changed, but it 
doesn't really matter. Developers just have to be careful on this particular 
issue (which is not well documented today).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com