[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2021-07-24 Thread Terry J. Reedy
Terry J. Reedy added the comment: Which comes out 'Tr̥Tīyā'. The underdot '̥' is '0x325' -- ___ Python tracker ___ ___ Python-bugs-

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2021-07-24 Thread Vishvas Vasuki
Vishvas Vasuki added the comment: This case still fails with 3.9 - 'Tr̥tīyā'.title() -- nosy: +vishvas.vasuki ___ Python tracker ___ _

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-26 Thread STINNER Victor
Change by STINNER Victor : -- nosy: -vstinner ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.p

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-26 Thread Irit Katriel
Irit Katriel added the comment: You're right, I see that too when I don't tamper with the test. -- components: +Library (Lib) ___ Python tracker ___ __

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-25 Thread Guido van Rossum
Guido van Rossum added the comment: Are you sure? Running Ezio's titletest.py, I get this output (note that the UCD major version is in the double digits so the test for that misfires :-). titletest.py: Please set your PYTHONIOENCODING envariable to utf8 WARNING: Your old UCD is out of date, e

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2020-10-25 Thread Irit Katriel
Irit Katriel added the comment: Of the examples given two seem ok now, but the Istanbul one is still wrong: >>> "déme un café".title() 'Déme Un Café' >>> "ᾲ στο διάολο".title() 'Ὰͅ Στο Διάολο' >>> >>> "i̇stanbul".title() 'İStanbul' -- nosy: +iritkatriel versions: +Python 3.10, Python

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-18 Thread Florent Xicluna
Changes by Florent Xicluna : -- nosy: +flox ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-01 Thread Martin v . Löwis
Martin v. Löwis added the comment: >> As for terminology: I think the documentation should continue to >> speak about "words" and "letters", and then define what is meant >> in this context. It's not that the Unicode consortium invented >> the term "letter", so we should use it more liberally th

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-01 Thread Tom Christiansen
Tom Christiansen added the comment: Martin v. Löwis wrote on Sat, 01 Oct 2011 10:59:48 -: >> * Word characters are Alphabetic + Mn+Mc+Me + Nd + Pc. > Where did you get that definition from? UTS#18 defines > "", which is Alphabetic + U+200C + U+200D > (i.e. not including marks, but in

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-10-01 Thread Martin v . Löwis
Martin v. Löwis added the comment: > * Word characters are Alphabetic + Mn+Mc+Me + Nd + Pc. Where did you get that definition from? UTS#18 defines "", which is Alphabetic + U+200C + U+200D (i.e. not including marks, but including those > I think you are looking for here are Word characters wi

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-30 Thread Guido van Rossum
Guido van Rossum added the comment: I like how we're actually converging on an implementable and maximally-useful algorithm. -- ___ Python tracker ___ __

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-30 Thread Tom Christiansen
Tom Christiansen added the comment: > Martin v. Löwis added the comment: > "Split S into words. Change the first letter in a word to upper-case, Except that I think you actually mean that the first "letter" is changed into titlecase not uppercase. One might also say *try* to change for al

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-30 Thread Martin v . Löwis
Martin v. Löwis added the comment: > Martin, do you think that str.title() should follow the Unicode standard? I don't think that "follow the Unicode standard" has any meaning in this context: the Unicode standard doesn't specify (AFAIK) what a .title() method in a programming language should d

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-29 Thread Ezio Melotti
Ezio Melotti added the comment: After PEP 393 the result is still the same (I attached a slightly improved version of the script): titlecase of 'deme un cafe' should be 'Deme Un Cafe' not 'DeMe Un Cafe' titlecase of 'istanbul' should be 'Istanbul' not 'IStanbul' titlecase of 'α στο

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-18 Thread Martin v . Löwis
Martin v. Löwis added the comment: Tom: it's intentional that .title() doesn't use traditional word break algorithms. In 2.x, "foo3bar".title() is "Foo3Bar", i.e. the 3 counts as a word end. So neither UTS#18 \w nor UAX#29 apply. So in UTS#18 terminology, .title() matches more closes \alpha+,

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-09-17 Thread Ezio Melotti
Ezio Melotti added the comment: I think string methods (and other parts of the stdlib) assume NFC and leave normalization to NFC up to the user. Before fixing str.title() we should take a more general decision about handling strings that use other normalization forms. -- __

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-26 Thread Tom Christiansen
Tom Christiansen added the comment: Guido van Rossum wrote on Fri, 26 Aug 2011 21:16:57 -: > Yeah, this should be fixed in 3.3 and probably backported to 3.2 > and 2.7. (There is already no guarantee that len(s) == > len(s.title()), right?) Well, *I* don't know of any such guarantee,

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-26 Thread Guido van Rossum
Guido van Rossum added the comment: Yeah, this should be fixed in 3.3 and probably backported to 3.2 and 2.7. (There is already no guarantee that len(s) == len(s.title()), right?) -- nosy: +gvanrossum ___ Python tracker

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-15 Thread Ezio Melotti
Ezio Melotti added the comment: So the issue here is that while using combing chars, str.title() fails to titlecase the string properly. The algorithm implemented by str.title() [0] is quite simple: it loops through the code units, and uppercases all the chars that follow a char that is not

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-15 Thread STINNER Victor
STINNER Victor added the comment: See also #12746. -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-13 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +haypo, loewis stage: -> needs patch versions: +Python 3.3 ___ Python tracker ___ ___ Python-bug

[issue12737] str.title() is overzealous by upcasing combining marks inappropriately

2011-08-12 Thread Terry J. Reedy
Terry J. Reedy added the comment: I changed the title because 'string' is a module that once contained the functions that are now attached to the str class as methods. So 'string.title' is an obsolete attribute reference. -- nosy: +terry.reedy title: string.title() is overzealous by