[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Serhiy Storchaka

10.08.19 22:10, Glenn Linderman пише:
As pointed out elsewhere, Raw strings have limitations, paths ending in 
\ cannot be represented, and such do exist in various situations, not 
all of which can be easily avoided... except by the "extra character 
contortion" of   "C:\directory\ "[:-1]  (does someone know a better way?)


Other common idiom is

r"C:\directory" "\\"

I wonder how many raw strings actually use the \"  escape productively? 
Maybe that should be deprecated too! ?  I can't think of a good and 
necessary use for it, can anyone?


This is an interesting question. I have performed some experiments. 15 
files in the stdlib (not counting the tokenizer) use \' or \" in raw 
strings. And one test (test_venv) is failed because of using them in 
third-party code. All cases are in regular expressions. It is possible 
to rewrite them, but it is less trivial task than fixing invalid escape 
sequences. So changing this will require much much more long deprecation 
period.

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GCDD6JQOPYENVDP3A62EFWHODIP2PFQM/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Eric V. Smith

On 8/10/2019 10:30 PM, Rob Cliffe via Python-Dev wrote:



On 10/08/2019 23:30:18, Greg Ewing wrote:

Rob Cliffe via Python-Dev wrote:


Also, the former is simply more *informative* - it tells the reader 
that baz is expected to be a directory, not a file.


On Windows you can usually tell that from the fact that filenames
almost always have an extension, and directory names almost never
do.

Usually, but not always.  I have not infrequently used files with a 
blank extension.
I can't recall using a directory name with an extension (but I can't 
swear that I never have).


I most commonly see this with bare git repositories .git. And 
I've created directory names with "extensions" for my own use.


Eric
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PJ4TPHOY6ZIWI5CQ56J3BYWTEBFYNMJU/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Steven D'Aprano
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

> Or invent "really raw" in some spelling, such as rr"c:\directory\"
> or e for exact, or x for exact, or  here>"c:\directory\"
> 
> And that brings me to the thought that if   \e  wants to become an 
> escape for escape, that maybe there should be an "extended escape" 
> prefix... if you want to use more escapes, define   ee"string where \\ 
> can only be used as an escape or escaped character, \e means the ASCII 
> escape character, and \ followed by a character with no escape 
> definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings, 
rb-strings, fr-strings, rf-strings, each of which comes in four 
varieties (single quote, double quote, triple single quote and triple 
double quote). Now you're talking about adding rr-strings, v-strings 
(Greg suggested that) and ee-strings, presumably some or all of which 
will need b*- and *b- or f*- and *f- varieties too.

If the plan to deprecate unrecognised escapes and then make them an 
exception goes ahead, and I expect that it will, in a few more releases 
this "extended escape" ee-string will be completely redundent. If \e is 
required, we will be able to add it to regular strings as needed, 
likewise for any future new escapes we might want. (If any.)

And if we end up keeping the existing behaviour, oh well, we can always 
write \x1B instead. New escapes are a Nice To Have, not a Must Have.

"Really raw" rr'' versus "nearly raw" r'' is a source of confusion just 
waiting to happen, when people use the wrong numbers of r's, or are 
simply unclear which they should use.

It's not like we have no other options:

location = r'C:\directory\subdirectory' '\\'

works fine. So does this:

location = 'directory/subdirectory/'.replace('/', os.sep)

Even better, instead of hard-coding our paths in the source code, we can 
read them from a config file or database.

It is unfortunate that Windows is so tricky with backslashes and 
forwards slashes, and that it clashes with the escape character, but I'm 
sure that other languages which use \ for escaping haven't proliferated 
a four or more kinds of strings with different escaping rules in 
response.



-- 
Steven
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2ZPNZTP3B7OEG2LQQXAGGYG6B76LYDB5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Paul Moore
On Sun, 11 Aug 2019 at 03:37, Rob Cliffe via Python-Dev
 wrote:
> Usually, but not always.  I have not infrequently used files with a
> blank extension.
> I can't recall using a directory name with an extension (but I can't
> swear that I never have).

I've often seen directory names like "1. Overview" on Windows.
Technically, " Overview" would be the extension here. Of course,
that's a silly example, but the point is that there's a difference
between what's clear to a human and what's clear to a computer...

Paul
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PDS2EGV77Z5B2IZWCN5LWF7XWQGBLMWQ/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Glenn Linderman

On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

10.08.19 22:10, Glenn Linderman пише:
As pointed out elsewhere, Raw strings have limitations, paths ending 
in \ cannot be represented, and such do exist in various situations, 
not all of which can be easily avoided... except by the "extra 
character contortion" of "C:\directory\ "[:-1]  (does someone know a 
better way?)


Other common idiom is

    r"C:\directory" "\\"


I suppose that concatenation happens at compile time; less sure about 
[:-1], I would guess not. Thanks for this.


I wonder how many raw strings actually use the \"  escape 
productively? Maybe that should be deprecated too! ?  I can't think 
of a good and necessary use for it, can anyone?


This is an interesting question. I have performed some experiments. 15 
files in the stdlib (not counting the tokenizer) use \' or \" in raw 
strings. And one test (test_venv) is failed because of using them in 
third-party code. All cases are in regular expressions. It is possible 
to rewrite them, but it is less trivial task than fixing invalid 
escape sequences. So changing this will require much much more long 
deprecation period.


Couldn't they be rewritten using the above idiom? Why would that be less 
trivial?
Or by using triple quotes, so the \" could be written as " ? That seems 
trivial.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/46TOPB5ZY24OXXBGSLUXOQOJOASGBVTL/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Glenn Linderman

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define   ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.


Don't forget the upper & lower case varieties :)


If the plan to deprecate unrecognised escapes and then make them an
exception goes ahead, and I expect that it will, in a few more releases
this "extended escape" ee-string will be completely redundent. If \e is
required, we will be able to add it to regular strings as needed,
likewise for any future new escapes we might want. (If any.)
So unrecognized escapes were deprecated in 3.6. And didn't get removed 
in 3.7. And from all indications, aren't going to be removed in 3.8. 
What makes you think the same arguments won't happen again for 3.9?



And if we end up keeping the existing behaviour, oh well, we can always
write \x1B instead. New escapes are a Nice To Have, not a Must Have.

"Really raw" rr'' versus "nearly raw" r'' is a source of confusion just
waiting to happen, when people use the wrong numbers of r's, or are
simply unclear which they should use.
I agree that Greg's v is far better than rr, especially if someone tried 
to write rfr or rbr.

It's not like we have no other options:

 location = r'C:\directory\subdirectory' '\\'

works fine.
But I never thought of that, until Serhiy mentioned it in his reply, so 
there are probably lots of other stupid people that didn't think of it 
either. It's not like it is even suggested in the documentation as a way 
to work around the non-rawness of raw strings. And it still requires 
doubling one of the \, so it is more consistent and understandable to 
just double them all.



  So does this:

 location = 'directory/subdirectory/'.replace('/', os.sep)


This is a far greater run-time cost with the need to scan the string. 
Granted the total cost isn't huge, unless it is done repeatedly.



Even better, instead of hard-coding our paths in the source code, we can
read them from a config file or database.
Yep, I do that sometimes. But hard-coded paths make good defaults in 
many circumstances.



It is unfortunate that Windows is so tricky with backslashes and
forwards slashes, and that it clashes with the escape character, but I'm
sure that other languages which use \ for escaping haven't proliferated
a four or more kinds of strings with different escaping rules in
response.


I agree with this. But Bill didn't consult Guido about the matter.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BPW6VYVKANWICN34TIOA6BVJYXX4MK3D/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Eric V. Smith

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define   ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.


Don't forget the upper & lower case varieties :)


And all orders!

>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 
'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}

>>> len(_all_string_prefixes())
25

And if you add just 'bv' and 'fv', it's 41:

{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 
'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 
'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 
'bv', 'b', 'u', 'f', 'rf'}


There would be no need for 'uv' (not needed for backward compatibility) 
or 'rv' (can't be both raw and verbatim).


I'm not in any way serious about this. I just want people to realize how 
many wacky combinations there would be. And heaven forbid we ever add 
some combination of 3 characters. If 'rfv' were actually also valid, you 
get to 89:


{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 
'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 
'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 
'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 
'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 
'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 
'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 
'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 
'frv', 'RvF'}


If only we could deprecate upper case prefixes!

Eric

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/B26LJOLLKKVDSQR6ZUVZKSFCU4WNXYC5/


[Python-Dev] Re: What to do about invalid escape sequences

2019-08-11 Thread Glenn Linderman

On 8/11/2019 8:40 PM, Eric V. Smith wrote:

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:


Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or "c:\directory\"

And that brings me to the thought that if   \e  wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.


Don't forget the upper & lower case varieties :)


And all orders!

>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 
'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}

>>> len(_all_string_prefixes())
25

And if you add just 'bv' and 'fv', it's 41:

{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 
'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 
'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 
'bv', 'b', 'u', 'f', 'rf'}


There would be no need for 'uv' (not needed for backward 
compatibility) or 'rv' (can't be both raw and verbatim).


I'm not in any way serious about this. I just want people to realize 
how many wacky combinations there would be. And heaven forbid we ever 
add some combination of 3 characters. If 'rfv' were actually also 
valid, you get to 89:


{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 
'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 
'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 
'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 
'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 
'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 
'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 
'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 
'rfv', 'fRV', 'frv', 'RvF'}


If only we could deprecate upper case prefixes!

Eric


Yes. Happily while there is a combinatorial explosion in spellings and 
casings, there is no cognitive overload: each character has an 
independent effect on the interpretation and use of the string, so once 
you understand the 5 existing types (b r u f and plain) you understand 
them all.


Should we add one or two more, it would be with the realization 
(hopefully realized in the documentation also) that v and e would 
effectively be replacements for r and plain, rather than being combined 
with them.


Were I to design a new language with similar string syntax, I think I 
would use plain quotes for verbatim strings only, and have the following 
prefixes, in only a single case:


(no prefix) - verbatim UTF-8 (at this point, I see no reason not to 
require UTF-8 for the encoding of source files)

b - for verbatim bytes
e - allow (only explicitly documented) escapes
f - format strings

Actually, the above could be done as a preprocessor for python, or a 
future import. In other words, what you see is what you get, until you 
add a prefix to add additional processing.  The only combinations that 
seem useful are  eb  and  ef.  I don't know that constraining the order 
of the prefixes would be helpful or not, if it is helpful, I have no 
problem with a canonical ordering being prescribed.


As a future import, one could code modules to either the current 
combinatorial explosion with all its gotchas, special cases, and passing 
of undefined escapes; or one could code to the clean limited cases above.


Another thing that seems awkward about the current strings is that {{ 
and }} become "special escapes".  If it were not for the permissive 
usage of \{ and \} in the current plain string processing, \{ and \} 
could have been used to escape the non-format-expression uses of { and 
}, which would be far more consistent with other escapes.  Perhaps the 
future import could regularize that, also.


A future import would have no backward compatibility issues to disrupt a 
simplified, more regular syntax.


Does anyone know of an existing feature that couldn't be expressed in a 
straightforward manner with only the above capabilities?



The only other thing that I have heard about regarding strings is that 
multi-line strings have their first line indented, and other lines not. 
Some have recommended making the fi