[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

I can't reproduce, your code snippet works fine. What Python version is it?

--
nosy: +pitrou

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke <[EMAIL PROTECTED]> added the comment:

I tested it with Python 2.5 on a Mac, Python 2.5 on FreeBSD, and Python 
2.6b2+ (from SVN as of this morning) on a Mac.

Perhaps the memory allocator on your machine is making a promise it can't 
keep?

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Perhaps. I'm under Linux.

However, at the end of the file_read() implementation in fileobject.c,
you can find the following lines:

if (bytesread != buffersize)
_PyString_Resize(&v, bytesread);

Which means that the string *is* resized at the end.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3270] test_multiprocessing: test_listener_client flakiness

2008-08-09 Thread Hirokazu Yamamoto

Hirokazu Yamamoto <[EMAIL PROTECTED]> added the comment:

I confirmed this patch works on my win2000.
And I believe it works on Trent's machine, too.
http://mail.python.org/pipermail/python-dev/2008-June/080525.html

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3526] Customized malloc implementation on SunOS and AIX

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Le vendredi 08 août 2008 à 22:46 +, Martin v. Löwis a écrit :
> Instead, Python's own memory allocate (obmalloc) should be changed to
> directly use the virtual memory interfaces of the operating system (i.e.
> mmap), bypassing the malloc of the C library.

How would that interact with fork()?

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke <[EMAIL PROTECTED]> added the comment:

You're right.  I mistook the string implementation for the list one 
which does keep a preallocated section in case of growth.  Strings of 
course don't grow so there's no need for that.

I tracked the memory allocation all the way down to 
obmalloc.c:PyObject_Realloc .  The call goes to realloc(p, nbytes) which 
is a C lib call.  It appears that the memory space is not reallocated.

That was enough to be able to find the python-dev thread "Darwin's 
realloc(...) implementation never shrinks allocations" from Jan. 2005, 
Bob Ippolito's post "realloc.. doesn’t?" 
(http://bob.pythonmac.org/archives/2005/01/01/realloc-doesnt/ ) and 
Issue1092502 .

Mind you, I also get the problem on FreeBSD 2.6 so it isn't Darwin 
specific.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Le samedi 09 août 2008 à 11:26 +, Andrew Dalke a écrit :
> Mind you, I also get the problem on FreeBSD 2.6 so it isn't Darwin 
> specific.

Darwin and the BSD's supposedly share a lot of common stuff.
But FreeBSD 2.6 is a bit old, isn't it?

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3501] expm1 missing

2008-08-09 Thread Mark Dickinson

Changes by Mark Dickinson <[EMAIL PROTECTED]>:


--
assignee:  -> marketdickinson
components: +Extension Modules -None
priority:  -> normal
versions: +Python 2.7, Python 3.1 -Python 3.0

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3532] bytes.tohex method

2008-08-09 Thread Matt Giuca

New submission from Matt Giuca <[EMAIL PROTECTED]>:

I haven't been able to find a way to encode a bytes object in
hexadecimal, where in Python 2.x I'd go "str.encode('hex')".

I recommend adding a bytes.tohex() method (in the same vein as the
existing bytes.fromhex class method).

I've attached a patch which adds this method to the bytes and bytearray
classes (in the C code). Also included documentation and test cases.

Style note: The bytesobject.c and bytearrayobject.c files are all over
the place in terms of tabs/spaces. I used tabs in bytesobject and spaces
in bytearrayobject, since those seemed to be the predominant styles in
either file.

Commit log:

Added "tohex" method to bytes and bytearray objects. Also added
documentation and test cases.

--
components: Interpreter Core
files: bytes.tohex.patch
keywords: patch
messages: 70932
nosy: mgiuca
severity: normal
status: open
title: bytes.tohex method
type: feature request
versions: Python 3.0
Added file: http://bugs.python.org/file11091/bytes.tohex.patch

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3492] Zlib compress/decompress functions returning bytearray

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

> Any updates ? The py3k list is also very silent since the
> week-end...Thanks!

Your two patches look good, I suppose either Alexandre or I will commit
them soon. 
You shouldn't to worry when you don't get a reply immediately, people
react simply when they have time to do so. And as for the mailing-list
activity, we are in the beginning of August which I guess implies many
people are on holidays.

--
nosy: +pitrou

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke <[EMAIL PROTECTED]> added the comment:

FreeBSD is why my hosting provider uses.  Freebsd.org calls 2.6 "legacy" 
but the latest update was earlier this year.

There is shared history with Macs.  I don't know the details though.  I 
just point out that the problem isn't only on Darwin.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3533] mac 10.4 buld of 3.0 --with-pydebug fails no __eprintf

2008-08-09 Thread Barry Alan Scott

New submission from Barry Alan Scott <[EMAIL PROTECTED]>:

I wanted to use Py_DEBUG build to help debug a problem
with ref counts in a C++ extension.

I cannot find eprintf in the sources of python
where does this symbol come from? How do I fix the
build to define it?


$ sw_vers 
ProductName:Mac OS X
ProductVersion: 10.4.11
BuildVersion:   8S165

$ ./configure --enable-framework --enable-debug --with-pydebug
$ make
...
/usr/bin/install -c -d -m 755 Python.framework/Versions/3.0
if test ""; then \
gcc -o Python.framework/Versions/3.0/Python  -dynamiclib \
-isysroot "" \
-all_load libpython3.0.a -Wl,-single_module \
-install_name
/Library/Frameworks/Python.framework/Versions/3.0/Python \
-compatibility_version 3.0 \
-current_version 3.0; \
else \
/usr/bin/libtool -o Python.framework/Versions/3.0/Python
-dynamic  libpython3.0.a \
 -lSystem -lSystemStubs -arch_only ppc -install_name
/Library/Frameworks/Python.framework/Versions/3.0/Python
-compatibility_version 3.0 -current_version 3.0 ;\
fi
ld: Undefined symbols:
___eprintf
/usr/bin/libtool: internal link edit command failed
make: *** [Python.framework/Versions/3.0/Python] Error 1

--
components: Build
messages: 70935
nosy: barry-scott
severity: normal
status: open
title: mac 10.4 buld of 3.0 --with-pydebug fails no __eprintf
versions: Python 3.0

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3160] Building a Win32 binary installer crashes

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Hi Viktor,

It's complicated for me to test under Windows right now, but your
snippet looks buggy:

script_data = open(self.pre_install_script, "r").read()
cfgdata = cfgdata + script_data + b"\n\0"

script_data is an unicode string because the file is opened in text
mode, but you try to concatenate it with bytes objects which will fail.
Please try to fix this and provide a proper patch :-)

PS : I agree it is important to fix this.

--
keywords: +patch
nosy: +pitrou
priority:  -> high

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3362] locale.getpreferredencoding() gives bus error on Mac OS X 10.4.11 PPC

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

locale.getpreferredencoding() should certainly not crash but the
question remains of what should be the outcome. I can see several
possibilities:
(1) return the empty string
(2) return None
(3) return "ascii" (!!)
(4) raise an exception (which one?)

(2) sounds the most logical to me, there is no preferred encoding in the
environment so we just return None to indicate that the application has
to choose its own default.

--
nosy: +pitrou
priority:  -> critical

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3362] locale.getpreferredencoding() gives bus error on Mac OS X 10.4.11 PPC

2008-08-09 Thread Antoine Pitrou

Changes by Antoine Pitrou <[EMAIL PROTECTED]>:


--
versions: +Python 2.6, Python 3.0

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3253] shutil.move bahave unexpected in fat32

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

This is already fixed in the trunk (which will become Python 2.6).

--
nosy: +pitrou
resolution:  -> fixed
status: open -> closed

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3205] bz2 iterator fails silently on MemoryError

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Fixed in r65609. Thanks for the report and for the patch!

--
nosy: +pitrou
resolution:  -> fixed
status: open -> closed

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3526] Customized malloc implementation on SunOS and AIX

2008-08-09 Thread Martin v. Löwis

Martin v. Löwis <[EMAIL PROTECTED]> added the comment:

>> Instead, Python's own memory allocate (obmalloc) should be changed to
>> directly use the virtual memory interfaces of the operating system (i.e.
>> mmap), bypassing the malloc of the C library.
> 
> How would that interact with fork()?

Nicely, why do you ask? Any anonymous mapping will be copied
(typically COW) to the child process, in fact, malloc itself
uses anonymous mapping (at least on Linux).

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3362] locale.getpreferredencoding() gives bus error on Mac OS X 10.4.11 PPC

2008-08-09 Thread Martin v. Löwis

Martin v. Löwis <[EMAIL PROTECTED]> added the comment:

No, getpreferredencoding should always produce an encoding name. If the
application had an idea what to use, it wouldn't have to ask. So I favor
(3), or, perhaps given that OSX uses UTF-8 in many places,

(5) return "UTF-8"

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3533] mac 10.4 buld of 3.0 --with-pydebug fails no __eprintf

2008-08-09 Thread Martin v. Löwis

Martin v. Löwis <[EMAIL PROTECTED]> added the comment:

Are you sure you are using the correct compiler (i.e. from the XCode
release relevant for your operating system version)?

--
nosy: +loewis

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3187] os.listdir can return byte strings

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Hmm, I suppose that while the filename is latin1-encoded,
Py_FileSystemDefaultEncoding is "utf-8" and therefore os.listdir fails
decoding the filename and falls back on returning a byte string.
It was acceptable in Python 2.x but is a very annoying problem in py3k
now that unicode and bytes objects can't be mixed together anymore. I'm
bumping this to critical, although there is probably no clean solution.

--
nosy: +pitrou
priority:  -> critical
type: crash -> behavior

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3532] bytes.tohex method

2008-08-09 Thread Martin v. Löwis

Martin v. Löwis <[EMAIL PROTECTED]> added the comment:

I recommend to use binascii.hexlify; this works in all Python version
(since 2000 or so).

I'm -1 for this patch.

--
nosy: +loewis

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3526] Customized malloc implementation on SunOS and AIX

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Le samedi 09 août 2008 à 17:28 +, Martin v. Löwis a écrit :
> Martin v. Löwis <[EMAIL PROTECTED]> added the comment:
> 
> >> Instead, Python's own memory allocate (obmalloc) should be changed to
> >> directly use the virtual memory interfaces of the operating system (i.e.
> >> mmap), bypassing the malloc of the C library.
> > 
> > How would that interact with fork()?
> 
> Nicely, why do you ask?

Because I didn't know :)
But looking at the dlmalloc implementation bundled in the patch, it
seems that using mmap/munmap (or VirtualAlloc/VirtualFree under Windows)
should be ok.

Do you think we should create a separate issue for this improvement? It
could also solve #3531.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3134] shutil references undefined WindowsError symbol

2008-08-09 Thread Antoine Pitrou

Antoine Pitrou <[EMAIL PROTECTED]> added the comment:

Raghuram, your patch looks good to me. I'll try to test it under Windows
soon.

--
nosy: +pitrou

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3080] Full unicode import system

2008-08-09 Thread Antoine Pitrou

Changes by Antoine Pitrou <[EMAIL PROTECTED]>:


--
priority:  -> critical
type:  -> behavior
versions: +Python 3.1 -Python 3.0

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3534] refactor.py can lose indentation for relative imports

2008-08-09 Thread Roger Upole

New submission from Roger Upole <[EMAIL PROTECTED]>:

Here's an excerpt from the output when run with --verbose.

@@ -138,7 +136,7 @@

def _MakeColorizer(self):
ext = os.path.splitext(self.GetDocument().GetPathName())
-   import formatter
+from . import formatter
return formatter.BuiltinPythonSourceFormatter(self, ext)

--
assignee: collinwinter
components: 2to3 (2.x to 3.0 conversion tool)
messages: 70947
nosy: collinwinter, rupole
severity: normal
status: open
title: refactor.py can lose indentation for relative imports
versions: Python 3.0

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3532] bytes.tohex method

2008-08-09 Thread Matt Giuca

Matt Giuca <[EMAIL PROTECTED]> added the comment:

> I recommend to use binascii.hexlify.

Ah, see I did not know about this! Thanks for pointing it out.

* However, it is *very* obscure. I've been using Python for a year and I
didn't know about it.
* And, it requires importing binascii.
* And, it results in a bytes object, not a str. That's weird. (Perhaps
it would be good idea to change the functions in the binascii module to
output strings instead of bytes? Ostensibly it looks like this module
hasn't undergone py3kification).

Would it hurt to have the tohex method of the bytes object to perform
this task as well? It would be much nicer to use since it's a method of
the object rather than having to find out about and import and use some
function.

Also why have a bytes.fromhex method when you could use binascii.unhexlify?

(If it's better from a code standpoint, you could replace the code I
wrote with a call to binascii.unhexlify).

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Matt Giuca

Matt Giuca <[EMAIL PROTECTED]> added the comment:

Bill, I had a look at your patch. I see you've decided to make
quote_as_string the default? In that case, I don't know why you had to
rewrite everything to implement the same basic behaviour as my patch.
(My latest few patches support bytes both ways). Anyway, I have a lot of
issues with your implementation.

* Why did you replace all of the existing machinery? Particularly the
way quote creates Quoter objects and stores them in a cache. I haven't
done any speed tests, but I assume that was all there for performance
reasons.

* The idea of quote_as_bytes is malformed. quote_as_bytes takes a str or
bytes, and outputs a URI as a bytes, while quote_as_string outputs a URI
as a str. This is the first time in the whole discussion we've
represented a URI as bytes, not a str. URIs are not byte sequences, they
are character sequences (see discussion below). I think only
quote_as_string is valid.

* The names unquote_as_* and quote_as_* are confusing. Use unquote_to_*
and quote_from_* to avoid ambiguity.

* Are unquote_as_string and unquote both part of your public interface?
That seems like unnecessary duplication.

* As Antoine pointed out above, it's too limiting for quote to force
UTF-8. Add a 'charset' parameter.

* Add an 'errors' parameter too, to give the caller control over how
strict to be.

* unquote and unquote_plus are missing 'charset' param, which should be
passed along to unquote_as_string.

* I do like the addition of a "plus" argument, as opposed to the
separate unquote_plus and quote_plus functions. I'd swap the arguments
to unquote around so charset is first and then plus, so you can write
unquote(mystring, 'utf-8') without using a keyword argument.

* In unquote: The "raw_unicode_escape" encoding makes no sense. It does
exactly the same thing as Latin-1, except it also looks for b"\\u"
in the string and converts that into a Unicode character. So your code
behaves like this:

>>> urllib.parse.unquote('%5Cu00fc')
'ü'
(Should output "\u00fc")
>>> urllib.parse.unquote('%5Cu')
UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in
position 11-12: truncated \u
(Should output "\u")

I suspect the email package (where you got the inspiration to use
'rawunicodeescape') has this same crazy problem, but that isn't my
concern today!

Aside from this weirdness, you're essentially defaulting unquote to
Latin-1. As I've said countless times, unquote needs to be the inverse
of quote, or you get this behaviour:

>>> urllib.parse.unquote(urllib.parse.quote('ü'))
'ü'

Once again, I refer you to my favourite web server example.

import http.server
s = http.server.HTTPServer(('',8000),
http.server.SimpleHTTPRequestHandler)
s.serve_forever()

Run this in a directory with a non-Latin-1 filename (eg. "漢字"), and
you will get a 404 when you click on the file.

* One issue I worked very hard to solve is how to deal with unescaped
non-ASCII characters in unquote. Technically this is an invalid URI, so
I'm not sure how important it is, but it's nice to be able to assume the
unquote function won't mess with them. For example,
unquote_as_string("\u6f22%C3%BC", charset="latin-1") should give
"\u6f22\u00fc" (or at least it would be nice). Yours raises
"UnicodeEncodeError: 'ascii' codec can't encode character". (I assume
this is a wanted property, given that the existing test suite tests that
unquote can handle ALL unescaped ASCII characters (that's what
escape_string in test_unquoting is for) - I merely extend this concept
to be able to handle all unescaped Unicode characters). Note that it's
impossible to allow such lenience if you implement unquote_as_string as
calling unquote_as_bytes and then decoding.

* Question: How does unquote_bytes deal with unescaped characters?
(Since this is a str->bytes transform, you need to encode them somehow).
I don't have a good answer for you here, which is one reason I think
it's wrong to treat a URI as an octet encoding. I treat them as UTF-8.
You treat them as ASCII. Since URIs are supposed to only contain ASCII,
the answers "ASCII", "Latin-1" and "UTF-8" are all as good as each
other, but as I said above, I prefer to be lenient and allow non-ASCII
URIs as input.

* Needs a lot more test cases, and documentation for your changes. I
suggest you plug my new test cases for urllib in and see if you can make
your code pass all the things I test for (and if not, have a good reason).

In addition, a major problem I have is with this dangerous assumption
that RFC 3986 specifies a byte->str encoding. You keep stating
assumptions like this:

> Remember that the RFC for percent-encoding really takes
> bytes in, and produces bytes out.  The string-in and string-out
> versions are to support naive programming (what a nice way of
> putting it!).

You assume that my patch, the string version of quote/unquote, is a
"hack" in order to satisfy the naive souls who only want to deal with
strings, while your method is the "pure 

[issue3532] bytes.tohex method

2008-08-09 Thread Martin v. Löwis

Martin v. Löwis <[EMAIL PROTECTED]> added the comment:

> * However, it is *very* obscure. I've been using Python for a year and I
> didn't know about it.

Hmm. There are probably many modules that you haven't used yet.

> * And, it requires importing binascii.

So what? The desire to convert bytes into hex strings is infrequent
enough to leave it out of the realm of a method. Also, Guido has
pronounced that he prefers functions over methods (and in this case,
I agree)

Using functions is more extensible. If you wanted to produce base-85
(say), then you can extend the functionality of bytes by providing a
function that does that, whereas you can't extend the existing bytes
type.

> * And, it results in a bytes object, not a str. That's weird. (Perhaps
> it would be good idea to change the functions in the binascii module to
> output strings instead of bytes? Ostensibly it looks like this module
> hasn't undergone py3kification).

There has been endless debates on this (or, something similar to this),
revolving around the question: "is base-64 text or binary"?

> Would it hurt to have the tohex method of the bytes object to perform
> this task as well?

IMO, yes, it would. It complicates the code, and draws the focus away
from the proper approach to data conversion (namely, functions - not
methods).

> It would be much nicer to use since it's a method of
> the object rather than having to find out about and import and use some
> function.

That's highly debatable.

> Also why have a bytes.fromhex method when you could use binascii.unhexlify?

Good point.

In any case, this is my opion; feel free to discuss this on python-dev.

Very clearly it is too late to add this for 3.0 now.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3532] bytes.tohex method

2008-08-09 Thread Martin v. Löwis

Changes by Martin v. Löwis <[EMAIL PROTECTED]>:


--
versions: +Python 3.1 -Python 3.0

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3533] mac 10.4 buld of 3.0 --with-pydebug fails no __eprintf

2008-08-09 Thread Barry Alan Scott

Barry Alan Scott <[EMAIL PROTECTED]> added the comment:

As far as I know I'm using the Xcode compiler. Does this match
your expectations?

$ which gcc
/usr/bin/gcc

$ gcc -v
Using built-in specs.
Target: powerpc-apple-darwin8
Configured with: /private/var/tmp/gcc/gcc-5341.obj~1/src/configure
--disable-checking -enable-werror --prefix=/usr --mandir=/share/man
--enable-languages=c,objc,c++,obj-c++
--program-transform-name=/^[cg][^.-]*$/s/$/-4.0/
--with-gxx-include-dir=/include/c++/4.0.0 --with-slibdir=/usr/lib
--build=powerpc-apple-darwin8 --host=powerpc-apple-darwin8
--target=powerpc-apple-darwin8
Thread model: posix
gcc version 4.0.1 (Apple Computer, Inc. build 5341)

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3160] Building a Win32 binary installer crashes

2008-08-09 Thread Viktor Ferenczi

Viktor Ferenczi <[EMAIL PROTECTED]> added the comment:

Thanks. Good point. :-)

I did not find that bug, since pre_install_script is not defined for my
project. Sorry, it is my fault. I did not test my patch deep enough.

I need to know one more thing before providing a better patch:

What is the expected encoding of the pre_install_script file?

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3187] os.listdir can return byte strings

2008-08-09 Thread Benjamin Peterson

Benjamin Peterson <[EMAIL PROTECTED]> added the comment:

Let's make this a release blocker for RCs.

--
priority: critical -> deferred blocker

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3534] refactor.py can lose indentation for relative imports

2008-08-09 Thread Benjamin Peterson

Benjamin Peterson <[EMAIL PROTECTED]> added the comment:

What version of 2to3 are you using? AFAIK, this has been fixed in the trunk.

--
nosy: +benjamin.peterson

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Matt,

Bill's main concern is with a policy decision; I doubt he would object to 
using your code once that is resolved.

The purpose of the quoting functions is to turn a string (representing the 
human-readable version) into bytes (that go over the wire).  If everything 
is ASCII, there isn't any disagreement -- but it also isn't obvious that 
they're bytes instead of characters.  So people started (well, continued, 
since it dates to pre-unicode C) treating them as though they were strings.

The fact that ASCII (and therefore most wire protocols) looks the same as 
bytes or as characters was one of the strongest arguments against splitting 
the bytes and string types.  Now that this has been done, Bill feels we 
should be consistent.  (You feel wire-protocol bytes should be treated as 
strings, if only as bytestrings, because the libraries use them that way -- 
but this is a policy decision.)

To quote the final paragraph of 1.2.1
"""
 In local or regional contexts and with improving technology, users
   might benefit from being able to use a wider range of characters;
   such use is not defined by this specification.  Percent-encoded
   octets (Section 2.1) may be used within a URI to represent characters
   outside the range of the US-ASCII coded character set if this
   representation is allowed by the scheme or by the protocol element in
   which the URI is referenced.  Such a definition should specify the
   character encoding used to map those characters to octets prior to
   being percent-encoded for the URI.
"""

So the mapping to bytes (or "octets") for non-ASCII isn't defined (here), 
and if you want to use it, you need to specify charset.  But in practice, 
people do use it without specifying a charset.  Which charset should be 
assumed?  The old code (and test cases) assumed Latin-1.  You want to 
assume UTF-8 (though you took the document charset when available -- which 
might also make sense).

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3534] refactor.py can lose indentation for relative imports

2008-08-09 Thread Roger Upole

Roger Upole <[EMAIL PROTECTED]> added the comment:

I was using 3.0b2.
The output is correct with latest updates,
sorry for the trouble.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3534] refactor.py can lose indentation for relative imports

2008-08-09 Thread Benjamin Peterson

Changes by Benjamin Peterson <[EMAIL PROTECTED]>:


--
resolution:  -> out of date
status: open -> closed

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3532] bytes.tohex method

2008-08-09 Thread Matt Giuca

Matt Giuca <[EMAIL PROTECTED]> added the comment:

You did the 3.1 thing again! We can accept a new feature like this
before 3.0b3, can we not?

> Hmm. There are probably many modules that you haven't used yet.

Snap :)

Well, I didn't know about the community's preference for functions over
methods. You make a lot of good points.

I think the biggest problem I have is the existence of fromhex. It's
really strange/inconsistent to have a fromhex without a tohex.

Also I think a lot of people (like me, in my relative inexperience) are
going to be at a loss as to why .encode('hex') went away, and they'll
easily be able to find .tohex (by typing help(bytes), or just guessing),
while binascii.hexlify is sufficiently obscure that I had to ask.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Matt Giuca

Matt Giuca <[EMAIL PROTECTED]> added the comment:

> Bill's main concern is with a policy decision; I doubt he would
> object to using your code once that is resolved.

But his patch does the same basic operations as mine, just implemented
differently and with the heap of issues I outlined above. So it doesn't
have anything to do with the policy decision.

> The purpose of the quoting functions is to turn a string
> (representing the human-readable version) into bytes (that go
> over the wire).

Ah hang on, that's a misunderstanding. There is a two-step process involved.

Step 1. Translate  string into an ASCII character string
by percent-encoding the . (If percent-encoding
characters, use an unspecified encoding).
Step 2. Serialize the ASCII character string into an octet sequence to
send it over the wire, using some unspecified encoding.

Step 1 is explained in detail throughout the RFC, particularly in
Section 1.2.1 Transcription ("Percent-encoded octets may be used within
a URI to represent characters outside the range of the US-ASCII coded
character set") and 2.1 Percent Encoding.

Step 2 is not actually part of the spec (because the spec outlines URIs
as character sequences, not how to send them over a network). It is
briefly described in Section 2 ("This specification does not mandate any
particular character encoding for mapping between URI characters and the
octets used to store or transmit those characters.  When a URI appears
in a protocol element, the character encoding is defined by that protocol").

Section 1.2.1:

> A URI may be represented in a variety of ways; e.g., ink on
> paper, pixels on a screen, or a sequence of character
> encoding octets.  The interpretation of a URI depends only on
> the characters used and not on how those characters are
> represented in a network protocol.

The RFC then goes on to describe a scenario of writing a URI down on a
napkin, before stating:

> A URI is a sequence of characters that is not always represented
> as a sequence of octets.

Right, so there is no debate that a URI (after percent-encoding) is a
character string, not a byte string. The debate is only whether it's a
character or byte string before percent-encoding.

Therefore, the concept of "quote_as_bytes" is flawed.

> You feel wire-protocol bytes should be treated as
> strings, if only as bytestrings, because the libraries use them
> that way.

No I do not. URIs post-encoding are character strings, in the Unicode
sense of the term "character". This entire topic has nothing to do with
the wire.

Note that the "charset" or "encoding" parameter in Bill/My patch
respectively isn't the mapping from URI strings to octets (that's
trivially ASCII). It's the charset used to encode character information
into octets which then get percent-encoded.

> The old code (and test cases) assumed Latin-1.

No, the old code and test cases were written for Python 2.x. They
assumed a byte string was being emitted (back when a byte string was a
string, so that was an acceptable output type). So they weren't assuming
an encoding. In fact the *ONLY* test case for Unicode in test_urllib
used a UTF-8-encoded string.

> r = urllib.parse.unquote('br%C3%BCckner_sapporo_20050930.doc')
> self.assertEqual(r, 'br\xc3\xbcckner_sapporo_20050930.doc')

In Python 2.x, this test case says "unquote('%C3%BC') should give me the
byte sequence '\xc3\xbc'", which is a valid case. In Python 3.0, the
code didn't change but the meaning subtly did. Now it says
"unquote('%C3%BC') should give the string 'ü'". The name is clearly
supposed to be "brückner", not "brückner", which means in Python 3.0 we
should EITHER be expecting the BYTE string b'\xc3\xbc' or the character
string 'ü'.

So the old code and test cases didn't assume any encoding, then they
were accidentally made to assume Latin-1 by the fact that the language
changed underneath them.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Matt Giuca

Matt Giuca <[EMAIL PROTECTED]> added the comment:

I've been thinking more about the errors="strict" default. I think this
was Guido's suggestion. I've decided I'd rather stick with errors="replace".

I changed errors="replace" to errors="strict" in patch 8, but now I'm
worried that will cause problems, specifically for unquote. Once again,
all the code in the stdlib which calls unquote doesn't provide an errors
option, so the default will be the only choice when using these other
services.

I'm concerned that there'll be lots of unhandled exceptions flying
around for URLs which aren't encoded with UTF-8, and a conscientious
programmer will not be able to protect against user errors.

Take the cgi module as an example. Typical usage is to write:
> fields = cgi.FieldStorage()
> foo = fields.getFirst("foo")

If the QUERY_STRING is "foo=w%FCt" (Latin-1), with errors='strict', you
get a UnicodeDecodeError when you call cgi.FieldStorage(). With
errors='replace', the variable foo will be "w�t". I think in general I'd
rather have '�'s in my program (representing invalid user input) than
exceptions, since this is usually a user input error, not a programming
error.

(One problem is that all I can do to handle this is catch a
UnicodeDecodeError on the call to FieldStorage; then I can't access any
of the data).

Now maybe something we can think about is propagating the "encoding" and
"errors" argument through to a few other major functions (such as
cgi.parse_qsl, cgi.FieldStorage and urllib.parse.urlencode), but that
should be separately to this patch.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2827] IDLE 3.0a5 cannot handle UTF-8

2008-08-09 Thread Senthil

Senthil <[EMAIL PROTECTED]> added the comment:

I was NOT able to Reproduce it in IDLE 3.0b2 running on Linux. Would you
like to try with 3.0b2 and also do.

tjreedy: I did not properly get your comment. When you open Idle
instance and create a new Document, cut-paste the code, and Run. The
Execution happens in the IDLE instance which was running. No need of
input() call.

--
nosy: +orsenthil

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3532] bytes.tohex method

2008-08-09 Thread Martin v. Löwis

Martin v. Löwis <[EMAIL PROTECTED]> added the comment:

> You did the 3.1 thing again! We can accept a new feature like this
> before 3.0b3, can we not?

Not without explicit approval by the release manager, no (or by BDFL
pronouncement).

The point of the betas is that *only* bugs get fixed, and *no* new
are features added.

___
Python tracker <[EMAIL PROTECTED]>

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com