On Fri, 2007-05-11 at 13:06 -0700, Guido van Rossum wrote:
> > attribution_pattern = re.compile(ur'(---?(?!-)|\u2014) *(?=[^ \n])')
>
> But wouldn't it be just as handy to teach the re module about \u and
> \U, just as it already knows about \x (and \123 octals)?
And \n, \r, etc. Implementin
M.-A. Lemburg wrote:
> * non-ASCII code points in text are not uncommon, they occur
> in most European scripts, all Asian scripts,
In an Asian script, almost every character is likely to
be non-ascii, which is going to be pretty hard to read
as a string of unicode escapes.
Maybe what we want is
On 2007-05-13 18:04, Martin v. Löwis wrote:
>> * without the Unicode escapes, the only way to put non-ASCII
>> code points into a raw Unicode string is via a source code encoding
>> of say UTF-8 or UTF-16, pretty much defeating the original
>> requirement of writing ASCII code only
>
> That'
> * without the Unicode escapes, the only way to put non-ASCII
> code points into a raw Unicode string is via a source code encoding
> of say UTF-8 or UTF-16, pretty much defeating the original
> requirement of writing ASCII code only
That's no problem, though - just don't put the Unicode ch
On 2007-05-12 02:42, Andrew McNabb wrote:
> On Sat, May 12, 2007 at 01:30:52AM +0200, M.-A. Lemburg wrote:
>> I wonder how we managed to survive all these years with
>> the existing consistent and concise definition of the
>> raw-unicode-escape codec ;-)
>>
>> There are two options:
>>
>> * no one
On Sat, May 12, 2007 at 01:30:52AM +0200, M.-A. Lemburg wrote:
>
> I wonder how we managed to survive all these years with
> the existing consistent and concise definition of the
> raw-unicode-escape codec ;-)
>
> There are two options:
>
> * no one really uses Unicode raw strings nowadays
>
>
On 2007-05-12 00:48, Martin v. Löwis wrote:
>> Using double backslashes won't cause that reaction:
>>
>> os.stat("c:\\windows\\system32\\user32.dll")
>
> Please refer to the subject. We are talking about raw strings.
If you'd leave the context in place, the reason for my suggestion
would become e
I think I'm going to break my own rules and ask Martin to write up a
PEP. Given the pragmatics that Windows pathnames *are* a common use
case, I'm willing to let allow the trailing \ in the string. A regular
expression containing a quote could be written using triple quotes,
e.g. r"""(["'])[^"']*\1
> BTW, there's an easy work-around for this special case:
>
> os.stat(os.path.join(r"c:\windows\system32", "user32.dll"))
No matter what the decision is, there are always work-arounds.
The question is what language suits the users most. Being
able to specify characters by ordinal IMO has much les
> Using double backslashes won't cause that reaction:
>
> os.stat("c:\\windows\\system32\\user32.dll")
Please refer to the subject. We are talking about raw strings.
>> Windows path names are one of the two primary applications of raw
>> strings (the other being regexes).
>
> IMHO the primary u
Martin v. Löwis wrote:
>> This is what prompted my question, actually: in Py3k, in the
>> str/unicode unification branch, r"\u1234" changes meaning: before the
>> unification, this was an 8-bit string, where the \u was not special,
>> but now it is a unicode string, where \u *is* special.
>>
>
> On 5/11/07, David Goodger <[EMAIL PROTECTED]> wrote:
> > Docutils uses it in the docutils.parsers.rst.states module, Body class:
> >
> > patterns = {
> > 'bullet': ur'[-+*\u2022\u2023\u2043]( +|$)',
> > ...
> >
> > attribution_pattern = re.compile(ur'(---?(?!-)|\u2014) *(?=[
> Guido van Rossum python.org> writes:
> > I'd like to hear from anyone who has access to *real code* that uses
> > \u or \U in a raw unicode string.
David Goodger python.org> writes:
> Docutils uses it in the docutils.parsers.rst.states module, Body class:
>
> patterns = {
> 'bul
On 5/11/07, David Goodger <[EMAIL PROTECTED]> wrote:
> Guido van Rossum python.org> writes:
> > I'd like to hear from anyone who has access to *real code* that uses
> > \u or \U in a raw unicode string.
>
> Docutils uses it in the docutils.parsers.rst.states module, Body class:
>
> patterns =
Guido van Rossum python.org> writes:
> I'd like to hear from anyone who has access to *real code* that uses
> \u or \U in a raw unicode string.
Docutils uses it in the docutils.parsers.rst.states module, Body class:
patterns = {
'bullet': ur'[-+*\u2022\u2023\u2043]( +|$)',
...
On 5/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Windows path names are one of the two primary applications of raw
> strings (the other being regexes).
I disagree with this use case; the r"..." notation was not invented
for this purpose. I won't compromise the escaping of quotes to
accom
On 2007-05-11 13:05, Thomas Heller wrote:
> M.-A. Lemburg schrieb:
>> On 2007-05-11 07:52, Martin v. Löwis wrote:
This is what prompted my question, actually: in Py3k, in the
str/unicode unification branch, r"\u1234" changes meaning: before the
unification, this was an 8-bit string,
M.-A. Lemburg schrieb:
> On 2007-05-11 07:52, Martin v. Löwis wrote:
>>> This is what prompted my question, actually: in Py3k, in the
>>> str/unicode unification branch, r"\u1234" changes meaning: before the
>>> unification, this was an 8-bit string, where the \u was not special,
>>> but now it is
M.-A. Lemburg schrieb:
>> Windows path names are one of the two primary applications of raw
>> strings (the other being regexes).
>
> IMHO the primary use case are regexps and for those you'd
> definitely want to be able to put Unicode characters into your
> expressions.
Except if sre_parse woul
On 2007-05-11 07:52, Martin v. Löwis wrote:
>> This is what prompted my question, actually: in Py3k, in the
>> str/unicode unification branch, r"\u1234" changes meaning: before the
>> unification, this was an 8-bit string, where the \u was not special,
>> but now it is a unicode string, where \u *i
Martin v. Löwis wrote:
>> This is what prompted my question, actually: in Py3k, in the
>> str/unicode unification branch, r"\u1234" changes meaning: before the
>> unification, this was an 8-bit string, where the \u was not special,
>> but now it is a unicode string, where \u *is* special.
>
> That
> This is what prompted my question, actually: in Py3k, in the
> str/unicode unification branch, r"\u1234" changes meaning: before the
> unification, this was an 8-bit string, where the \u was not special,
> but now it is a unicode string, where \u *is* special.
That is true for non-raw strings al
Greg Ewing schrieb:
> Martin v. Löwis wrote:
>> why should you be able to get a non-ASCII character
>> into a raw Unicode string?
>
> The analogous question would be why can't you get a
> non-Unicode character into a raw Unicode string.
No, that would not be analogous. The string type in Python
i
On 5/10/07, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> > why should you be able to get a non-ASCII character
> > into a raw Unicode string?
>
> The analogous question would be why can't you get a
> non-Unicode character into a raw Unicode string. That
> wouldn't make sense, si
Martin v. Löwis wrote:
> why should you be able to get a non-ASCII character
> into a raw Unicode string?
The analogous question would be why can't you get a
non-Unicode character into a raw Unicode string. That
wouldn't make sense, since Unicode strings can't even
hold non-Unicode characters (or
On 5/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> >> I actually disagree with that. It is fairly easy to include non-ASCII
> >> characters in a raw Unicode string - just type them in.
> >
> > That violates the convention used in many places that source code
> > should only contain printabl
>> I actually disagree with that. It is fairly easy to include non-ASCII
>> characters in a raw Unicode string - just type them in.
>
> That violates the convention used in many places that source code
> should only contain printable ASCII, and all non-ASCII or unprintable
> characters should be w
On 5/10/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > However, I understand the other reason (inclusion of non-ASCII
> > characters in raw strings) and I reluctantly agree with it.
>
> I actually disagree with that. It is fairly easy to include non-ASCII
> characters in a raw Unicode string
> However, I understand the other reason (inclusion of non-ASCII
> characters in raw strings) and I reluctantly agree with it.
I actually disagree with that. It is fairly easy to include non-ASCII
characters in a raw Unicode string - just type them in. Or, if that
fails, use string concatenation w
On 2007-05-11 00:11, Guido van Rossum wrote:
> On 5/10/07, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> On 2007-05-10 20:53, Paul Moore wrote:
>>> On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
I just discovered that, in all versions of Python as far back as I
have access to (2.0
On 5/10/07, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> On 2007-05-10 20:53, Paul Moore wrote:
> > On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> >> I just discovered that, in all versions of Python as far back as I
> >> have access to (2.0), \u escapes are interpreted inside raw
> >
On 2007-05-10 20:53, Paul Moore wrote:
> On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
>> I just discovered that, in all versions of Python as far back as I
>> have access to (2.0), \u escapes are interpreted inside raw
>> unicode strings. Thus:
> [...]
>> Does anyone remember why it
On 10/05/07, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> I just discovered that, in all versions of Python as far back as I
> have access to (2.0), \u escapes are interpreted inside raw
> unicode strings. Thus:
[...]
> Does anyone remember why it is done this way? The reference manual
> descr
33 matches
Mail list logo