Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-14 Thread Hrvoje Nikšić
On Fri, 2007-05-11 at 13:06 -0700, Guido van Rossum wrote: > > attribution_pattern = re.compile(ur'(---?(?!-)|\u2014) *(?=[^ \n])') > > But wouldn't it be just as handy to teach the re module about \u and > \U, just as it already knows about \x (and \123 octals)? And \n, \r, etc. Implementin

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-13 Thread Greg Ewing
M.-A. Lemburg wrote: > * non-ASCII code points in text are not uncommon, they occur > in most European scripts, all Asian scripts, In an Asian script, almost every character is likely to be non-ascii, which is going to be pretty hard to read as a string of unicode escapes. Maybe what we want is

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-13 Thread M.-A. Lemburg
On 2007-05-13 18:04, Martin v. Löwis wrote: >> * without the Unicode escapes, the only way to put non-ASCII >> code points into a raw Unicode string is via a source code encoding >> of say UTF-8 or UTF-16, pretty much defeating the original >> requirement of writing ASCII code only > > That'

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-13 Thread Martin v. Löwis
> * without the Unicode escapes, the only way to put non-ASCII > code points into a raw Unicode string is via a source code encoding > of say UTF-8 or UTF-16, pretty much defeating the original > requirement of writing ASCII code only That's no problem, though - just don't put the Unicode ch

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-13 Thread M.-A. Lemburg
On 2007-05-12 02:42, Andrew McNabb wrote: > On Sat, May 12, 2007 at 01:30:52AM +0200, M.-A. Lemburg wrote: >> I wonder how we managed to survive all these years with >> the existing consistent and concise definition of the >> raw-unicode-escape codec ;-) >> >> There are two options: >> >> * no one

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Andrew McNabb
On Sat, May 12, 2007 at 01:30:52AM +0200, M.-A. Lemburg wrote: > > I wonder how we managed to survive all these years with > the existing consistent and concise definition of the > raw-unicode-escape codec ;-) > > There are two options: > > * no one really uses Unicode raw strings nowadays > >

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread M.-A. Lemburg
On 2007-05-12 00:48, Martin v. Löwis wrote: >> Using double backslashes won't cause that reaction: >> >> os.stat("c:\\windows\\system32\\user32.dll") > > Please refer to the subject. We are talking about raw strings. If you'd leave the context in place, the reason for my suggestion would become e

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Guido van Rossum
I think I'm going to break my own rules and ask Martin to write up a PEP. Given the pragmatics that Windows pathnames *are* a common use case, I'm willing to let allow the trailing \ in the string. A regular expression containing a quote could be written using triple quotes, e.g. r"""(["'])[^"']*\1

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Martin v. Löwis
> BTW, there's an easy work-around for this special case: > > os.stat(os.path.join(r"c:\windows\system32", "user32.dll")) No matter what the decision is, there are always work-arounds. The question is what language suits the users most. Being able to specify characters by ordinal IMO has much les

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Martin v. Löwis
> Using double backslashes won't cause that reaction: > > os.stat("c:\\windows\\system32\\user32.dll") Please refer to the subject. We are talking about raw strings. >> Windows path names are one of the two primary applications of raw >> strings (the other being regexes). > > IMHO the primary u

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Michael Foord
Martin v. Löwis wrote: >> This is what prompted my question, actually: in Py3k, in the >> str/unicode unification branch, r"\u1234" changes meaning: before the >> unification, this was an 8-bit string, where the \u was not special, >> but now it is a unicode string, where \u *is* special. >> >

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread David Goodger
> On 5/11/07, David Goodger <[EMAIL PROTECTED]> wrote: > > Docutils uses it in the docutils.parsers.rst.states module, Body class: > > > > patterns = { > > 'bullet': ur'[-+*\u2022\u2023\u2043]( +|$)', > > ... > > > > attribution_pattern = re.compile(ur'(---?(?!-)|\u2014) *(?=[

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread David Goodger
> Guido van Rossum python.org> writes: > > I'd like to hear from anyone who has access to *real code* that uses > > \u or \U in a raw unicode string. David Goodger python.org> writes: > Docutils uses it in the docutils.parsers.rst.states module, Body class: > > patterns = { > 'bul

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Guido van Rossum
On 5/11/07, David Goodger <[EMAIL PROTECTED]> wrote: > Guido van Rossum python.org> writes: > > I'd like to hear from anyone who has access to *real code* that uses > > \u or \U in a raw unicode string. > > Docutils uses it in the docutils.parsers.rst.states module, Body class: > > patterns =

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread David Goodger
Guido van Rossum python.org> writes: > I'd like to hear from anyone who has access to *real code* that uses > \u or \U in a raw unicode string. Docutils uses it in the docutils.parsers.rst.states module, Body class: patterns = { 'bullet': ur'[-+*\u2022\u2023\u2043]( +|$)', ...

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Guido van Rossum
On 5/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > Windows path names are one of the two primary applications of raw > strings (the other being regexes). I disagree with this use case; the r"..." notation was not invented for this purpose. I won't compromise the escaping of quotes to accom

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread M.-A. Lemburg
On 2007-05-11 13:05, Thomas Heller wrote: > M.-A. Lemburg schrieb: >> On 2007-05-11 07:52, Martin v. Löwis wrote: This is what prompted my question, actually: in Py3k, in the str/unicode unification branch, r"\u1234" changes meaning: before the unification, this was an 8-bit string,

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Thomas Heller
M.-A. Lemburg schrieb: > On 2007-05-11 07:52, Martin v. Löwis wrote: >>> This is what prompted my question, actually: in Py3k, in the >>> str/unicode unification branch, r"\u1234" changes meaning: before the >>> unification, this was an 8-bit string, where the \u was not special, >>> but now it is

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Georg Brandl
M.-A. Lemburg schrieb: >> Windows path names are one of the two primary applications of raw >> strings (the other being regexes). > > IMHO the primary use case are regexps and for those you'd > definitely want to be able to put Unicode characters into your > expressions. Except if sre_parse woul

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread M.-A. Lemburg
On 2007-05-11 07:52, Martin v. Löwis wrote: >> This is what prompted my question, actually: in Py3k, in the >> str/unicode unification branch, r"\u1234" changes meaning: before the >> unification, this was an 8-bit string, where the \u was not special, >> but now it is a unicode string, where \u *i

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-11 Thread Ron Adam
Martin v. Löwis wrote: >> This is what prompted my question, actually: in Py3k, in the >> str/unicode unification branch, r"\u1234" changes meaning: before the >> unification, this was an 8-bit string, where the \u was not special, >> but now it is a unicode string, where \u *is* special. > > That

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Martin v. Löwis
> This is what prompted my question, actually: in Py3k, in the > str/unicode unification branch, r"\u1234" changes meaning: before the > unification, this was an 8-bit string, where the \u was not special, > but now it is a unicode string, where \u *is* special. That is true for non-raw strings al

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Martin v. Löwis
Greg Ewing schrieb: > Martin v. Löwis wrote: >> why should you be able to get a non-ASCII character >> into a raw Unicode string? > > The analogous question would be why can't you get a > non-Unicode character into a raw Unicode string. No, that would not be analogous. The string type in Python i

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Guido van Rossum
On 5/10/07, Greg Ewing <[EMAIL PROTECTED]> wrote: > Martin v. Löwis wrote: > > why should you be able to get a non-ASCII character > > into a raw Unicode string? > > The analogous question would be why can't you get a > non-Unicode character into a raw Unicode string. That > wouldn't make sense, si

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Greg Ewing
Martin v. Löwis wrote: > why should you be able to get a non-ASCII character > into a raw Unicode string? The analogous question would be why can't you get a non-Unicode character into a raw Unicode string. That wouldn't make sense, since Unicode strings can't even hold non-Unicode characters (or

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Guido van Rossum
On 5/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > >> I actually disagree with that. It is fairly easy to include non-ASCII > >> characters in a raw Unicode string - just type them in. > > > > That violates the convention used in many places that source code > > should only contain printabl

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Martin v. Löwis
>> I actually disagree with that. It is fairly easy to include non-ASCII >> characters in a raw Unicode string - just type them in. > > That violates the convention used in many places that source code > should only contain printable ASCII, and all non-ASCII or unprintable > characters should be w

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Guido van Rossum
On 5/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > However, I understand the other reason (inclusion of non-ASCII > > characters in raw strings) and I reluctantly agree with it. > > I actually disagree with that. It is fairly easy to include non-ASCII > characters in a raw Unicode string

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Martin v. Löwis
> However, I understand the other reason (inclusion of non-ASCII > characters in raw strings) and I reluctantly agree with it. I actually disagree with that. It is fairly easy to include non-ASCII characters in a raw Unicode string - just type them in. Or, if that fails, use string concatenation w

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread M.-A. Lemburg
On 2007-05-11 00:11, Guido van Rossum wrote: > On 5/10/07, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: >> On 2007-05-10 20:53, Paul Moore wrote: >>> On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: I just discovered that, in all versions of Python as far back as I have access to (2.0

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Guido van Rossum
On 5/10/07, M.-A. Lemburg <[EMAIL PROTECTED]> wrote: > On 2007-05-10 20:53, Paul Moore wrote: > > On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > >> I just discovered that, in all versions of Python as far back as I > >> have access to (2.0), \u escapes are interpreted inside raw > >

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread M.-A. Lemburg
On 2007-05-10 20:53, Paul Moore wrote: > On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: >> I just discovered that, in all versions of Python as far back as I >> have access to (2.0), \u escapes are interpreted inside raw >> unicode strings. Thus: > [...] >> Does anyone remember why it

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

2007-05-10 Thread Paul Moore
On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote: > I just discovered that, in all versions of Python as far back as I > have access to (2.0), \u escapes are interpreted inside raw > unicode strings. Thus: [...] > Does anyone remember why it is done this way? The reference manual > descr