jyknight wrote:

I like the idea of this warning, but I'm afraid the diagnostic wording isn't 
sufficient to result in correct fixes to code. Instead, it seems to result in 
simply adding explicit casts to make the compiler shut up. Even from people who 
know what they're doing w.r.t. Unicode.

The first response I got in a discussion about an instance of `implicit 
conversion from char16_t to char32_t may change the meaning of the represented 
code unit`, was (approximately) "What an obnoxious warning, _of course it's 
fine_ to zero-extend a char16_t codepoint to a char32_t codepoint!" This, from 
an subject matter expert, maintainer of a unicode library.

And, of course, it _is_ fine if you happen to know that the char16_t was 
representing a valid codepoint that happens to be limited to under 64K. 
Which..._could_ be the case...it's just not common. And, worse, if it is true 
in a given case, then the API in question is dangerous and invites misuse by 
its callers, because it has decided upon an an unusual/unexpected use of types 
(char16_t as a codepoint, instead of the expected use of char16_t as a UTF-16 
code-unit).

So, I think that we need to somehow explain in these diagnostics -- in very few 
words! -- that char16_t should represent UTF-16 code-units, while char32_t 
represents unicode codepoints, and that you _probably_ need to refactor your 
code to decode a sequence of UTF-16 char16_t into char32_t codepoints, rather 
than simply insert an explicit cast of an individual char16_t.

https://github.com/llvm/llvm-project/pull/138708
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to