[Python-Dev] Re: What to do about invalid escape sequences
10.08.19 22:10, Glenn Linderman пише: As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?) Other common idiom is r"C:\directory" "\\" I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone? This is an interesting question. I have performed some experiments. 15 files in the stdlib (not counting the tokenizer) use \' or \" in raw strings. And one test (test_venv) is failed because of using them in third-party code. All cases are in regular expressions. It is possible to rewrite them, but it is less trivial task than fixing invalid escape sequences. So changing this will require much much more long deprecation period. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GCDD6JQOPYENVDP3A62EFWHODIP2PFQM/
[Python-Dev] Re: What to do about invalid escape sequences
On 8/10/2019 10:30 PM, Rob Cliffe via Python-Dev wrote: On 10/08/2019 23:30:18, Greg Ewing wrote: Rob Cliffe via Python-Dev wrote: Also, the former is simply more *informative* - it tells the reader that baz is expected to be a directory, not a file. On Windows you can usually tell that from the fact that filenames almost always have an extension, and directory names almost never do. Usually, but not always. I have not infrequently used files with a blank extension. I can't recall using a directory name with an extension (but I can't swear that I never have). I most commonly see this with bare git repositories .git. And I've created directory names with "extensions" for my own use. Eric ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PJ4TPHOY6ZIWI5CQ56J3BYWTEBFYNMJU/
[Python-Dev] Re: What to do about invalid escape sequences
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote: > Or invent "really raw" in some spelling, such as rr"c:\directory\" > or e for exact, or x for exact, or here>"c:\directory\" > > And that brings me to the thought that if \e wants to become an > escape for escape, that maybe there should be an "extended escape" > prefix... if you want to use more escapes, define ee"string where \\ > can only be used as an escape or escaped character, \e means the ASCII > escape character, and \ followed by a character with no escape > definition would be an error." Please no. We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too. If the plan to deprecate unrecognised escapes and then make them an exception goes ahead, and I expect that it will, in a few more releases this "extended escape" ee-string will be completely redundent. If \e is required, we will be able to add it to regular strings as needed, likewise for any future new escapes we might want. (If any.) And if we end up keeping the existing behaviour, oh well, we can always write \x1B instead. New escapes are a Nice To Have, not a Must Have. "Really raw" rr'' versus "nearly raw" r'' is a source of confusion just waiting to happen, when people use the wrong numbers of r's, or are simply unclear which they should use. It's not like we have no other options: location = r'C:\directory\subdirectory' '\\' works fine. So does this: location = 'directory/subdirectory/'.replace('/', os.sep) Even better, instead of hard-coding our paths in the source code, we can read them from a config file or database. It is unfortunate that Windows is so tricky with backslashes and forwards slashes, and that it clashes with the escape character, but I'm sure that other languages which use \ for escaping haven't proliferated a four or more kinds of strings with different escaping rules in response. -- Steven ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2ZPNZTP3B7OEG2LQQXAGGYG6B76LYDB5/
[Python-Dev] Re: What to do about invalid escape sequences
On Sun, 11 Aug 2019 at 03:37, Rob Cliffe via Python-Dev wrote: > Usually, but not always. I have not infrequently used files with a > blank extension. > I can't recall using a directory name with an extension (but I can't > swear that I never have). I've often seen directory names like "1. Overview" on Windows. Technically, " Overview" would be the extension here. Of course, that's a silly example, but the point is that there's a difference between what's clear to a human and what's clear to a computer... Paul ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDS2EGV77Z5B2IZWCN5LWF7XWQGBLMWQ/
[Python-Dev] Re: What to do about invalid escape sequences
On 8/11/2019 1:26 AM, Serhiy Storchaka wrote: 10.08.19 22:10, Glenn Linderman пише: As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?) Other common idiom is r"C:\directory" "\\" I suppose that concatenation happens at compile time; less sure about [:-1], I would guess not. Thanks for this. I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone? This is an interesting question. I have performed some experiments. 15 files in the stdlib (not counting the tokenizer) use \' or \" in raw strings. And one test (test_venv) is failed because of using them in third-party code. All cases are in regular expressions. It is possible to rewrite them, but it is less trivial task than fixing invalid escape sequences. So changing this will require much much more long deprecation period. Couldn't they be rewritten using the above idiom? Why would that be less trivial? Or by using triple quotes, so the \" could be written as " ? That seems trivial. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/46TOPB5ZY24OXXBGSLUXOQOJOASGBVTL/
[Python-Dev] Re: What to do about invalid escape sequences
On 8/11/2019 2:50 AM, Steven D'Aprano wrote: On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote: Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or "c:\directory\" And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Please no. We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too. Don't forget the upper & lower case varieties :) If the plan to deprecate unrecognised escapes and then make them an exception goes ahead, and I expect that it will, in a few more releases this "extended escape" ee-string will be completely redundent. If \e is required, we will be able to add it to regular strings as needed, likewise for any future new escapes we might want. (If any.) So unrecognized escapes were deprecated in 3.6. And didn't get removed in 3.7. And from all indications, aren't going to be removed in 3.8. What makes you think the same arguments won't happen again for 3.9? And if we end up keeping the existing behaviour, oh well, we can always write \x1B instead. New escapes are a Nice To Have, not a Must Have. "Really raw" rr'' versus "nearly raw" r'' is a source of confusion just waiting to happen, when people use the wrong numbers of r's, or are simply unclear which they should use. I agree that Greg's v is far better than rr, especially if someone tried to write rfr or rbr. It's not like we have no other options: location = r'C:\directory\subdirectory' '\\' works fine. But I never thought of that, until Serhiy mentioned it in his reply, so there are probably lots of other stupid people that didn't think of it either. It's not like it is even suggested in the documentation as a way to work around the non-rawness of raw strings. And it still requires doubling one of the \, so it is more consistent and understandable to just double them all. So does this: location = 'directory/subdirectory/'.replace('/', os.sep) This is a far greater run-time cost with the need to scan the string. Granted the total cost isn't huge, unless it is done repeatedly. Even better, instead of hard-coding our paths in the source code, we can read them from a config file or database. Yep, I do that sometimes. But hard-coded paths make good defaults in many circumstances. It is unfortunate that Windows is so tricky with backslashes and forwards slashes, and that it clashes with the escape character, but I'm sure that other languages which use \ for escaping haven't proliferated a four or more kinds of strings with different escaping rules in response. I agree with this. But Bill didn't consult Guido about the matter. ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BPW6VYVKANWICN34TIOA6BVJYXX4MK3D/
[Python-Dev] Re: What to do about invalid escape sequences
On 8/11/2019 4:18 PM, Glenn Linderman wrote: On 8/11/2019 2:50 AM, Steven D'Aprano wrote: On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote: Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or "c:\directory\" And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Please no. We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too. Don't forget the upper & lower case varieties :) And all orders! >>> _all_string_prefixes() {'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'} >>> len(_all_string_prefixes()) 25 And if you add just 'bv' and 'fv', it's 41: {'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 'bv', 'b', 'u', 'f', 'rf'} There would be no need for 'uv' (not needed for backward compatibility) or 'rv' (can't be both raw and verbatim). I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be. And heaven forbid we ever add some combination of 3 characters. If 'rfv' were actually also valid, you get to 89: {'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 'frv', 'RvF'} If only we could deprecate upper case prefixes! Eric ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/B26LJOLLKKVDSQR6ZUVZKSFCU4WNXYC5/
[Python-Dev] Re: What to do about invalid escape sequences
On 8/11/2019 8:40 PM, Eric V. Smith wrote: On 8/11/2019 4:18 PM, Glenn Linderman wrote: On 8/11/2019 2:50 AM, Steven D'Aprano wrote: On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote: Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or "c:\directory\" And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Please no. We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too. Don't forget the upper & lower case varieties :) And all orders! >>> _all_string_prefixes() {'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'} >>> len(_all_string_prefixes()) 25 And if you add just 'bv' and 'fv', it's 41: {'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 'bv', 'b', 'u', 'f', 'rf'} There would be no need for 'uv' (not needed for backward compatibility) or 'rv' (can't be both raw and verbatim). I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be. And heaven forbid we ever add some combination of 3 characters. If 'rfv' were actually also valid, you get to 89: {'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 'frv', 'RvF'} If only we could deprecate upper case prefixes! Eric Yes. Happily while there is a combinatorial explosion in spellings and casings, there is no cognitive overload: each character has an independent effect on the interpretation and use of the string, so once you understand the 5 existing types (b r u f and plain) you understand them all. Should we add one or two more, it would be with the realization (hopefully realized in the documentation also) that v and e would effectively be replacements for r and plain, rather than being combined with them. Were I to design a new language with similar string syntax, I think I would use plain quotes for verbatim strings only, and have the following prefixes, in only a single case: (no prefix) - verbatim UTF-8 (at this point, I see no reason not to require UTF-8 for the encoding of source files) b - for verbatim bytes e - allow (only explicitly documented) escapes f - format strings Actually, the above could be done as a preprocessor for python, or a future import. In other words, what you see is what you get, until you add a prefix to add additional processing. The only combinations that seem useful are eb and ef. I don't know that constraining the order of the prefixes would be helpful or not, if it is helpful, I have no problem with a canonical ordering being prescribed. As a future import, one could code modules to either the current combinatorial explosion with all its gotchas, special cases, and passing of undefined escapes; or one could code to the clean limited cases above. Another thing that seems awkward about the current strings is that {{ and }} become "special escapes". If it were not for the permissive usage of \{ and \} in the current plain string processing, \{ and \} could have been used to escape the non-format-expression uses of { and }, which would be far more consistent with other escapes. Perhaps the future import could regularize that, also. A future import would have no backward compatibility issues to disrupt a simplified, more regular syntax. Does anyone know of an existing feature that couldn't be expressed in a straightforward manner with only the above capabilities? The only other thing that I have heard about regarding strings is that multi-line strings have their first line indented, and other lines not. Some have recommended making the fi