[issue32987] tokenize.py parses unicode identifiers incorrectly

2018-03-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: #24194 is about tokenize failing, including on middle dot. There is another tokenize name issue, already closed. I referenced Serhiy's analysis there and on the two \w issues, and closed one of them. -- resolution: -> duplicate stage: needs patch -

[issue32987] tokenize.py parses unicode identifiers incorrectly

2018-03-14 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This issue and issue12486 doesn't have any common except that both are related to the tokenize module. There are two bugs: a too narrow definition of \w in the re module (see issue12731 and issue1693050) and a too narrow definition of Name in the tokenize

[issue32987] tokenize.py parses unicode identifiers incorrectly

2018-03-13 Thread Terry J. Reedy
Terry J. Reedy added the comment: I think the issues are slightly different. #12486 is about the awkwardness of the API. This is about a false error after jumping through the hoops, which I think Steve B did correctly. Following the link, the Other_ID_Continue chars are 00B7 ; Oth

[issue32987] tokenize.py parses unicode identifiers incorrectly

2018-03-13 Thread Cheryl Sabella
Cheryl Sabella added the comment: I believe this may be a duplicate of issue 12486. -- nosy: +csabella ___ Python tracker ___ ___ Py

[issue32987] tokenize.py parses unicode identifiers incorrectly

2018-03-09 Thread Terry J. Reedy
Terry J. Reedy added the comment: I verified on Win10 with 3.5 (which cannot be patched) and 3.7.0b2 that ab·cd is accepted as a name and that tokenize fails as described. -- nosy: +terry.reedy stage: -> needs patch versions: +Python 3.7, Python 3.8 __

[issue32987] tokenize.py parses unicode identifiers incorrectly

2018-03-02 Thread Steve B
New submission from Steve B : Here is an example involving the unicode character MIDDLE DOT · : The line ab·cd = 7 is valid Python 3 code and is happily accepted by the CPython interpreter. However, tokenize.py does not like it. It says that the middle-dot is an error token. Here is an exampl