[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2020-01-31 Thread Henry S. Thompson
Henry S. Thompson added the comment: [One year and 2 days later... :-[ Is this fixed in 3.9? If not, the Versions list above should be updated. The failure of lower() to preserve 'alpha-ness' is a serious bug, it causes significant failures in e.g. Turkish NLP, and it'

[issue12731] python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

2019-01-29 Thread Henry S. Thompson
Henry S. Thompson added the comment: This issue is also implicated in a failure of isalpha and friends. Easy way to see this is to compare >>> isalpha('İ') True >>> isalpha('İ'.lower()) False This results from the use of a combining character to encode lo