[issue7516] Flag "-3" is silently ignored when running "regrtest.py -j2"

2009-12-15 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Instead of just warning the user, wouldn't it be better to add proper support for inheriting the env vars and other Python command line flags ? Whether or not to use -E can be determined via the test module name. The -3 flag and others can be querie

[issue7551] SystemError/MemoryError/OverflowErrors on encode() a unicode string

2009-12-21 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: All string length calculations in Python 2.4 are done using ints which are 32-bit, even on 64-bit platforms. Since UTF-8 can use up to 4 bytes per Unicode code point, the encoder overallocates the needed chunk of memory to len*4 bytes. This will go

[issue7562] Custom order for the subcommands of build

2009-12-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: The distutils way of implementing a different fixed order would be to create a build command sub-class, override the .sub_commands list and then register this new subclass as 'build' command with distutils via the cmdclass setup() keyword argumen

[issue3745] _sha256 et al. encode to UTF-8 by default

2009-12-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Gregory P. Smith wrote: > > Gregory P. Smith added the comment: > > lemburg - see which issue #? Sorry, the message got truncated for some reason. I was referring to http://bugs.python.org/issue3745 This was discussed on pytho

[issue7615] unicode_escape codec does not escape quotes

2010-01-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Richard Hansen wrote: > > New submission from Richard Hansen : > > The description of the unicode_escape codec says that it produces "a > string that is suitable as Unicode literal in Python source code." [1] > Unfortunately,

[issue3745] _sha256 et al. encode to UTF-8 by default

2010-01-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Gregory P. Smith wrote: > > Gregory P. Smith added the comment: > > trunk r77252 switches python 2.7 to use 's*' for argument parsing. unicodes > can be hashed (encoded to the system default encoding by s*) again. > > Th

[issue3745] _sha256 et al. encode to UTF-8 by default

2010-01-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Gregory P. Smith wrote: > > Gregory P. Smith added the comment: > > In order to get a -3 PyErr_WarnPy3k warning for unicode being passed to > hashlib objects (a nice idea) I suggest creating an additonal 's*' like thing &g

[issue7622] [patch] improve unicode methods: split() rsplit() and replace()

2010-01-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: A few comments on coding style: * please keep the existing argument formats as they are, e.g. count = countstring(self_s, self_len, from_s, from_len, 0, self_len, FORWARD, maxcount); or /* helper

[issue7622] [patch] improve unicode methods: split() rsplit() and replace()

2010-01-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Florent Xicluna wrote: > > > > Florent Xicluna added the comment: > > >> >> * function declarations should not put parameters on new lines: >> >> >> >> +stringlib_splitlines( >> >> +

[issue7622] [patch] improve unicode methods: split() rsplit() and replace()

2010-01-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Eric Smith wrote: > > > > Eric Smith added the comment: > > > > I think we should use whatever style is currently being used in the code. > > If we want to go back through this code (or any other code) and PEP7-ify >

[issue7643] What is an ASCII linebreak?

2010-01-06 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Florent Xicluna wrote: > > New submission from Florent Xicluna : > > Bytes objects and Unicode objects do not agree on ASCII linebreaks. > > ## Python 2 > > for s in '\x0a\x0d\x1c\x1d\x1e': > print u'a{}b

[issue7615] unicode_escape codec does not escape quotes

2010-01-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Richard Hansen wrote: > > Richard Hansen added the comment: > >> Does the last patch obsolete the first two? If so please delete the >> obsolete ones. > > Yes and no -- it depends on what the core Python developers want

[issue7643] What is an ASCII linebreak?

2010-01-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Florent Xicluna wrote: > > Florent Xicluna added the comment: > > Some technical background. > > == Unicode == > > According to the Unicode Standard Annex #9, a character with > bidirectional class B is a "Para

[issue7643] What is an ASCII linebreak?

2010-01-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Florent Xicluna wrote: > > Florent Xicluna added the comment: > > It's confusing. > > There's a specific annex UAX #14 which defines "Line Breaking Properties". > Some properties are defines as "Mandatory

[issue5127] Use Py_UCS4 instead of Py_UNICODE in unicodectype.c

2010-01-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I don't see the point in changing the various conversion APIs in the unicode database to return Py_UCS4 when there are no conversions that map code points between BMP and non-BMP. In order to solve the problem in question (unicode_repr() failing

[issue7663] UCS4 build incorrectly translates cases for non-BMP code points

2010-01-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: This is a duplicate of http://bugs.python.org/issue5127 -- nosy: +lemburg resolution: -> duplicate status: open -> closed title: UTF-16 build incorrectly translates cases for non-BMP code points -> UCS4 build incorrectly translates case

[issue7643] What is a Unicode line break character?

2010-01-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Florent Xicluna wrote: > > Florent Xicluna added the comment: > > I don't know what to do about this: > >> - FS, GS, RS are combined marks (CM): “Prohibit a line break between >>the character and the preceding cha

[issue5127] Use Py_UCS4 instead of Py_UNICODE in unicodectype.c

2010-01-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc added the comment: > >> I don't see the point in changing the various conversion APIs in the >> unicode database to return Py_UCS4 when there are no conversions tha

[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Adam Olsen wrote: > > Adam Olsen added the comment: > > Points against the subclassing argument: > > * We have a null-termination invariant. For byte strings this was part of > the public API, and I'm not sure that

[issue2375] PYTHON3PATH environment variable to supersede PYTHONPATH for multi-Python environments

2010-01-13 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Setting up specific environments for each Python version is outside the scope of Python. This is something the user needs to handle using a virtualenv setup, an env-setup shell script or similar approach. -- nosy: +lemburg resolution

[issue2375] PYTHON3PATH environment variable to supersede PYTHONPATH for multi-Python environments

2010-01-13 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: R. David Murray wrote: > > R. David Murray added the comment: > > I disagree with the closing of this bug on the following grounds: currently, > and for the foreseeable future, there will be two python commands on many > systems, &#x

[issue2375] PYTHON3PATH environment variable to supersede PYTHONPATH for multi-Python environments

2010-01-13 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: R. David Murray wrote: > > R. David Murray added the comment: > > Yes, it does: > > rdmur...@maestro:~/python/py3k>ls -l ../ptest/p3/bin total 7328 > -rwxr-xr-x 1 rdmurray rdmurray 131 Dec 20 12:22 2to3 > -rwxr-xr-x 1 rd

[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I created two scripts for exporting the IronPython findings and checking them in CPython. These are the results: Checking code Page 28591 against encoding 'iso-8859-1' using file 'iso-8859-1.map' 0 errors Checking code Page 28592 a

[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg
Changes by Marc-Andre Lemburg : Added file: http://bugs.python.org/file15858/export-encodings.py ___ Python tracker <http://bugs.python.org/issue6058> ___ ___ Python-bug

[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg
Changes by Marc-Andre Lemburg : Added file: http://bugs.python.org/file15859/check-encodings.py ___ Python tracker <http://bugs.python.org/issue6058> ___ ___ Python-bug

[issue6058] Add cp65001 to encodings/aliases.py

2010-01-13 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: What we could do is add new codecs based on the .NET tables for cp65000 et al. However, before doing this, I'd like to know where these code page settings can occur and what exact names are used for them. If they only appear in .NET and IronPyth

[issue2375] PYTHON3PATH environment variable to supersede PYTHONPATH for multi-Python environments

2010-01-14 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Guido pronounced on this on python-dev, so closing the request again. -- status: open -> closed ___ Python tracker <http://bugs.python.org/iss

[issue5905] strptime fails in non-UTF locale

2010-01-15 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: The reason for this is that the strftime() C lib API is used to build localized month names. With your setting, you'll get French Latin-1 month names and those cannot be coerced to UTF-8 due to the accented characters in them. This works in Pytho

[issue5284] platform.linux_distribution() improperly documented

2009-02-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-16 22:42, Armin Ronacher wrote: > New submission from Armin Ronacher : > > platform.linux_distribution() was added in 2.6 as an alias for > platform.dist(). However the documentation lists platform.dist() as an

[issue5284] platform.linux_distribution() improperly documented

2009-02-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Fixed in r69717. -- versions: -Python 2.4, Python 2.5, Python 2.6, Python 3.0, Python 3.1 ___ Python tracker <http://bugs.python.org/issue5

[issue5284] platform.linux_distribution() improperly documented

2009-02-17 Thread Marc-Andre Lemburg
Changes by Marc-Andre Lemburg : -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue5284> ___ ___ Python-bugs-list mailing list Unsubscri

[issue4431] Distutils MSVC doesn't create manifest file (with fix)

2009-02-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-17 20:22, Pavel Repin wrote: > Pavel Repin added the comment: > > I'd like to point out that on some configurations (at least mine), you > really need to specify /MANIFEST option to the linker, even though MSDN > document

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-24 20:39, Mark Dickinson wrote: > Mark Dickinson added the comment: > > Updated Victor's patch: > > - applies cleanly against newly whitespace-normalized unicodeobject.c > - renamed USE_WCHAR_SURROGATE to CONV

[issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)

2009-02-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-02-24 21:50, Mark Dickinson wrote: > Mark Dickinson added the comment: > > New patch, with two separate versions of PyUnicode_FromWideChar. Thanks, much better :-) ___ Python tracker <http://bug

[issue5389] Uninitialized variable may be used in PyUnicode_DecodeUTF7Stateful()

2009-03-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: The UTF-7 codec implementation has a few problems (one of them is that it is hardly being used, so bugs only get detected very slowly). issue4426 has a patch with cleaned up and more standards compliant implementation. Perhaps that also fixes the problem

[issue5445] codecs.StreamWriter.writelines problem when passed generator

2009-03-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: For the common case where list is in fact a sequence of strings, the used implementation is a lot faster and more efficient than the one you propose. Note that the method doesn't pretend to support generators for the list argument, so adding suppor

[issue5445] codecs.StreamWriter.writelines problem when passed generator

2009-03-10 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-10 16:36, Daniel Lescohier wrote: > Daniel Lescohier added the comment: > > Let me give an example of why it's important that writelines > iteratively writes. For: > > rows = (line[:-1].split('\t')

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-14 02:32, Antoine Pitrou wrote: > Antoine Pitrou added the comment: > > Based on the feedback above, it seems this should be committed, > shouldn't it? +1 As mentioned several times on the ticket: static C data is not re

[issue2066] Adding new CNS11643, a *huge* charset, support in cjkcodecs

2009-03-17 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-17 13:30, Hye-Shik Chang wrote: > Hye-Shik Chang added the comment: > > When I asked Taiwanese developers how often they use these character > sets, it appeared that they are almost useless in the usual computing > environment i

[issue1322] platform.dist() has unpredictable result under Linux

2009-03-20 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-18 18:13, Matthias Klose wrote: > Matthias Klose added the comment: > > MAL, please can we add zooko's patch in some form? The current > implementation assumes an implementation, which doesn't exist on all > platforms

[issue1498930] Generate from Unicode database instead of manualy coding.

2009-03-21 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: You may not know it, but these functions are generated from the Unicode database. However, because these functions need to be fast and are small enough, they were not converted to the unicodetype_db approach and instead left as they were originally

[issue1337876] Inconsistent use of buffer interface in string and unicode

2009-03-21 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: This looks like a useful addition for Python 2.x - not sure about 3.x, since that doesn't have the 2.x buffer interface anymore. Phil, could you update the patch for Python 2.7. -- nosy: +lemburg versions: -Pytho

[issue5561] platform.python_version_tuple returns tuple of ints, should be strings

2009-03-25 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Thanks, that's clearly a bug. Note that the module is still compatible with Python 1.5.2, so using string methods is not possible. -- ___ Python tracker <http://bugs.python.org/i

[issue5561] platform.python_version_tuple returns tuple of ints, should be strings

2009-03-25 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Checked in a fix for Python 2.7 and 2.6 (r70594:70596). -- status: open -> closed versions: +Python 2.7 ___ Python tracker <http://bugs.python.org/iss

[issue5214] Add KOI8-RU as a known encoding

2009-03-27 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Viktor, I found this reference which has some background information regarding koi8-ru and other cyrillic encodings: http://segfault.kiev.ua/cyrillic-encodings/ "This charset wasn't supported by Ukrainian Internet community due to political reaso

[issue1581182] Definition of a "character" is wrong

2009-03-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: See this talk for an explanation of the various Unicode terms and how they map to Python's implementation: http://www.egenix.com/library/presentations/#PythonAndUnicode Also note that the Unicode standard has evolved a lot since Unicode support was

[issue4753] Faster opcode dispatch on gcc

2009-03-31 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-03-31 03:19, A.M. Kuchling wrote: > A.M. Kuchling added the comment: > > Is a backport to 2.7 still planned? I hope it is. -- ___ Python tracker <http://bugs.python.o

[issue3672] Ill-formed surrogates not treated as errors during encoding/decoding

2009-04-29 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: While it's probably ok to fix the codecs, there's an issue which makes this difficult at least for the utf-8 codec: The marshal module uses utf-8 to write Unicode objects and these can and need to be able to store the full range of supported UCS2

[issue3672] Ill-formed surrogates not treated as errors during encoding/decoding

2009-04-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-04-29 22:39, Martin v. Löwis @psf.upfronthosting.co.za wrote: > Martin v. Löwis added the comment: > > I think we could preserve the marshal format with yet another error > handler - one that emits half surrogates into their in

[issue5902] Stricter codec names

2009-05-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-05-02 11:20, Georg Brandl wrote: > Georg Brandl added the comment: > > I don't think this is a good idea. Accepting all common forms for > encoding names means that you can usually give Python an encoding name > from, e.g.

[issue5902] Stricter codec names

2009-05-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: On 2009-05-04 19:04, Georg Brandl wrote: > Georg Brandl added the comment: > > So, do you also think "utf" and "latin" should stay? For Python 3.x, I think those can be removed. For 2.x it's better to keep them. Note

[issue6078] freeze.py doesn't work

2009-05-22 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: The problem is that the warnings module's init function does not adhere to the standard Python naming scheme for extension modules: it's called _PyWarnings_Init rather than init_warnings. This C helper module was added to Python 2.6. OTOH, war

[issue1943] improved allocation of PyUnicode objects

2009-05-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine, I think we have to make a decision here: I'm still -1 on changing PyUnicodeObject to be a PyVarObject, but do like your experiments with the free lists. I also still believe that tuning the existing parameters in the Unicode implementatio

[issue1943] improved allocation of PyUnicode objects

2009-05-24 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ok, then closing the patch as rejected. -- resolution: -> rejected status: open -> closed ___ Python tracker <http://bugs.python.org/

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine, I have explained the reasons for rejecting the patch. In short, it violates a design principle behind the Unicode implementation. If you want to change such a basic aspect of the Unicode implementation, then write a PEP which demonstrates the

[issue1943] improved allocation of PyUnicode objects

2009-05-25 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Amaury Forgeot d'Arc wrote: > Amaury Forgeot d'Arc added the comment: > > Looking at the comments, it seems that the performance gain comes from > the removal of the double allocation which is needed by the current design.

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Jim Jewett wrote: > Jim Jewett added the comment: > > There were a number of patches to support sharing of data between > unicode objects. (By Larry Hastings?) They were rejected because (a) > they were complicated, and (b) it

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> There were a number of patches to support sharing of data between >> unicode objects. (By Larry Hastings?) They were rejected because (a) >> they were comp

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> The patch breaks C API + binary compatibility for an essential Python >> type - that's not something you can easily undo. > > I don't see how it breaks

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Here's an example implementation of a Unicode sub-type that allows referencing other Unicode objects: http://downloads.egenix.com/python/unicoderef-0.0.1.tar.gz As you can see, it's pretty straight-forward to write and I want to keep i

[issue1943] improved allocation of PyUnicode objects

2009-06-02 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> You cannot simply recompile your code and have it working. > > Who is "you"? > People doing mundane things with PyUnicodeObjects certainly can, > assuming

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: > That's unfortunate; it would clearly have been easier to change this in 3.1. > > That said, I'm not sure anyone *should* be subclassing PyUnicode. Maybe > Marc-Andre can explain why he is doing this (or point to the message in

[issue1943] improved allocation of PyUnicode objects

2009-06-03 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: >> Instead of changing PyUnicodeObject from a PyObject to a PyVarObject, >> making sub-typing a lot harder, I'd much rather apply a single change >> for 3.1: raising the KEEPALIVE_SIZE_LIMIT to 32 as explaine

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Here's a new version of the unicode reference type, extended to run in both Python 2.6 and 3.1: http://downloads.egenix.com/python/unicoderef-0.0.2.tar.gz I've also included a benchmark implemented in C which measures Unicode/Bytes allocation p

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Guido van Rossum wrote: > Guido van Rossum added the comment: > > On Wed, Jun 3, 2009 at 1:41 PM, Antoine Pitrou wrote: >> Apart from the example Marc-André just posted (and which is a 0.0.1 >> proof of concept he apparently just

[issue1943] improved allocation of PyUnicode objects

2009-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > >> Since pymalloc is being used to manage such objects, there's >> a lot of room for improvements, since the allocation scheme >> is under out control. E.g. we

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > Antoine Pitrou added the comment: > > Raymond suggested the patch be committed in 3.1, so as to minimize > disruption between 3.1 and 3.2. Benjamin, what do you think? Has Guido pronounced on

[issue1943] improved allocation of PyUnicode objects

2009-06-05 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Guido van Rossum wrote: > I think it's fine to wait for 3.2. Maybe add something to the docs > about not subclassing unicode in C. We should have a wider discussion about this on python-dev. I'll publish the unicoderef extension and

[issue1943] improved allocation of PyUnicode objects

2009-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Terry J. Reedy wrote: > In the interest of possibly improving the imminent 3.1 release, > I opened #6216 > Raise Unicode KEEPALIVE_SIZE_LIMIT from 9 to 32? Thanks for opening that ticket. > I wonder if it is possible to make it generical

[issue6216] Raise Unicode KEEPALIVE_SIZE_LIMIT from 9 to 32?

2009-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: I think we should also consider raising the free list limit of currently 1024 objects. The keep-alive optimization currently uses at most 1024 * 9 * 2 = 18432 bytes (+ pymalloc overhead) on a UCS2 build of Python in the worst case. With a limit of 32 you

[issue3410] platform.version() don't work as expected in Vista in portuguese

2009-07-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Ezio Melotti wrote: > Ezio Melotti added the comment: > > I tried platform.version() on a non-English Vista and XP and I got > '32bit' for Vista and '5.1.2600' for XP. With platform.platform() I got > 'Windows-32bi

[issue8781] 32-bit wchar_t doesn't need to be unsigned to be usable (I think)

2010-05-26 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > > The problem with a signed Py_UNICODE is implicit sign extension (rather than > zero extension) in some conversions, for example from "char" or "unsigned

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > New submission from STINNER Victor : > > readbuffer_encode() and charbuffer_encode() are not really encoder nor > related to encodings: they are related to PyBuffer. readbuffer_encode() uses > "s#"

[issue8839] PyArg_ParseTuple(): remove "t# format

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > New submission from STINNER Victor : > > "t#" format was introduced by r11803 (11 years ago): "Implement new format > character 't#'. This is like s#, accepting an object that implements

[issue8839] PyArg_ParseTuple(): remove "t# format

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > Patch to remove "t#": > - Update c-api/arg.rst documentation > - Replace "t#" format by "y#" in codecs.charbuffer_encode() > - Add a not

[issue8839] PyArg_ParseTuple(): remove "t# format

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> Given that "y#" is not (yet) in wide-spread use, ... > > t# is only used once (in codecs.charbuffer_encode()), whereas y# is used by > ossaudiodev, so

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> Those two encoder functions were meant to be used by Python codec >> implementations which want to use the readbuffer and charbuffer >> interfaces available in

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> Any Python object can expose a buffer interface and the above >> functions then allow accessing these interfaces from within >> Python. > > What's t

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> class BinaryDataCodec(codecs.Codec): >> >> # Note: Binding these as C functions will result in the class not >> # converting them to methods

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > >> Martin said these codecs are coming back in 3.2. I said that and it was discussed on the python-dev mailing list a while back. We'll also add .transform() methods on bytes and str objects to access sa

[issue7475] codecs missing: base64 bz2 hex zlib ...

2010-05-28 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > I agree with Martin: codecs choosed the wrong direction in Python2, and it's > fixed in Python3. The codecs module is related to charsets (encodings), > should e

[issue8854] msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on Windows x64

2010-05-29 Thread Marc-Andre Lemburg
New submission from Marc-Andre Lemburg : When installing Visual Studio 2008 SP1 on a Windows Vista x64 system, the installer registers the various registry keys under Software\Wow6432Node\Microsoft\VisualStudio\9.0\ rather than Software\Microsoft\VisualStudio\9.0\ This is due to some

[issue8854] msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on Windows x64

2010-05-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Martin v. Löwis wrote: > > Martin v. Löwis added the comment: > > This shouldn't be necessary. If a 32-bit Python looks into the registry, it > will get automatically redirected to Wow6432Node. If a 64-bit Python looks > into the

[issue7983] The encoding map from Unicode to CP932 is different from that of Windows'

2010-05-30 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Hye-Shik, could you please comment on this ? The Windows version appears to replace private use code points with CJK compatibility idiographs, ie. uses standard Unicode code points rather than private escape code points (for round-trip safety

[issue4487] Add utf8 alias for email charsets

2010-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: R. David Murray wrote: > > R. David Murray added the comment: > > For various reasons the email module has a table of character sets. What > might be most effective would be for the email module to look a character set > name up in

[issue4487] Add utf8 alias for email charsets

2010-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: R. David Murray wrote: > > R. David Murray added the comment: > > Mark, any objection to my putting this patch in now, and then we'll fix the > aliases implementation in 3.2? No. Please open a new issue targeting Python 3.2 for thi

[issue8898] The email package should defer to the codecs module for all aliases

2010-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Shashwat Anand wrote: > > Shashwat Anand added the comment: > > from email.charset.ALIASES most of them failed to be recognize by codecs > module. > > >>>> for i in email.charset.ALIASES.keys(): > ...

[issue8898] The email package should defer to the codecs module for all aliases

2010-06-04 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Shashwat Anand wrote: > > Shashwat Anand added the comment: > >> We need to add aliases for those codecs. The current aliases >> list only supports the format "latinN" for N in 1-10. > > latinN means latin1 to latin

[issue7989] Transition time/datetime C modules to Python

2010-06-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Alexander Belopolsky wrote: > > Alexander Belopolsky added the comment: > > As far as I remember, the datetime module started as a pure python module and > was reimplemented in C around year 2003 or so. One of the important > addi

[issue8922] Improve encoding shortcuts in PyUnicode_AsEncodedString()

2010-06-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > New submission from STINNER Victor : > > PyUnicode_Decode() and PyUnicode_AsEncodedString() calls directly builtin > decoders/encoders for some known encodings (eg. "utf-8"), instead of usi

[issue8923] Remove unused "errors" argument from _PyUnicode_AsDefaultEncodedString()

2010-06-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > New submission from STINNER Victor : > > _PyUnicode_AsDefaultEncodedString() has two arguments: unicode (input string) > and errors. If errors is not NULL, it calls Py_FatalError()! > > The argument is

[issue8925] Improve c-api/arg.rst: use "bytes" or "str" types instead of "string"

2010-06-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > New submission from STINNER Victor : > > http://docs.python.org/py3k/c-api/arg.html is unclear about what is a > "string". > > Attached patch: > - Use directly bytes, bytearray and str types

[issue8839] PyArg_ParseTuple(): remove "t# format

2010-06-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > New version of the patch: > - charbuffer_encode() uses y* instead of y# format to accept modifiable > buffer objects (eg. bytearray) > - Improve the documenta

[issue8922] Improve encoding shortcuts in PyUnicode_AsEncodedString()

2010-06-07 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > >> the shortcuts were meant for Python internal use only > > str.encode() calls PyUnicode_AsEncodedString() and bytes.decode() calls > PyUnicode_Decode(), so it is n

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > MAL agreed to remove "t#" parsing format (#8839), whereas charbuffer_encode() > main goal was to offer "t#" parsing format to Python object space. >

[issue7989] Transition time/datetime C modules to Python

2010-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Brett Cannon wrote: > > Brett Cannon added the comment: > > So I see a couple of objections here to the idea that I will try to address. > > First is MAL's thinking that this will undo any C code, which it won't. The > i

[issue7989] Add pure Python implementation of datetime module to CPython

2010-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Oops, sorry. Looks like the Roundup email interface changed the ticket title back to the old one again (I was replying to Brett's comment under the old title). -- title: Transition time/datetime C modules to Python -> Add pur

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-06-08 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: Antoine Pitrou wrote: > > Antoine Pitrou added the comment: > >> Please leave readbuffer_encode() as-is. > > Then please add documentation for it. Will do. -- title: Remove codecs.readbuffer_encode()and cod

[issue8838] Remove codecs.readbuffer_encode() and codecs.charbuffer_encode()

2010-06-09 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > r81854 removes codecs.charbuffer_encode() (and t# parsing format) from Python > 3.2 (blocked in 3.1: r81855). > > -- > > My problem with codecs.readbuffer_enco

[issue8939] Use C type names (PyUnicode etc;) in the C API docs

2010-06-09 Thread Marc-Andre Lemburg
Marc-Andre Lemburg added the comment: STINNER Victor wrote: > > STINNER Victor added the comment: > > Big patch: > - replace Python types by C Python types (eg. str => PyUnicodeObject* and > None => Py_None) I was thinking of e.g. "PyUnicode", not "

<    10   11   12   13   14   15   16   17   18   19   >