> On 11 Jun 2024, at 21:08, Giuseppe D'Angelo via Development 
> <development@qt-project.org> wrote:
> 
> Il 11/06/24 07:12, Thiago Macieira ha scritto:
>> I'm arguing that such code is likely already broken (producing mojibake) for
>> non-US-ASCII content, so having U+FFFD instead of mojibake is not worse. You
>> wouldn't be able to work around the issue by un-doing the improper encoding,
>> which means it would force users to fix their code.
> 
> Is it? I somehow suspect that there's a lot of code out there that does stuff 
> like:
> 
>  string.indexOf('\xfc')   // search for ü
> 
> or similar.
> 
> (Usual disclaimer: not every developer is aware of encodings. Maybe they 
> tried 'ü', and got a mysterious warning from the compiler, and the code 
> didn't work; so they just put '\xfc' instead, and now it works -- ok, let's 
> carry on.)
> 
> I'm not claiming that the situation is ideal, as we're clearly being 
> inconsistent: `char` is being treated as UTF-8 or Latin1 depending on the 
> context.
> 
> Yet, breaking a ~20 year behavior in "low-level code" is ... scary? It should 
> require extraordinary motivation and care; we're probably talking about 
> making 6.8->6.14 warn if someone passes a non-ASCII char to 
> QASV/QChar(char)'s constructor, and change behavior to accept ASCII-only in 
> 6.15?


I do agree that it makes more sense to assume that code that feeds a single 
`char` into a Qt API wants that character to be interpreted as Latin1. For one, 
because it has been like that forever, and it is still so in case of e.g. 
QChar(char). Also, if the character value is outside the US-ASCII range, then 
the only alternative would be to interpret it as an incomplete UTF-8 sequence, 
which can’t be the right answer. QString::arg(char) (or operator+(char), for 
that matter) is not usable as a tool to assembly a valid string from individual 
UTF-8 code points, after all.

Volker

-- 
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Reply via email to