shafik added a comment. In D66447#1638783 <https://reviews.llvm.org/D66447#1638783>, @labath wrote:
> In D66447#1638047 <https://reviews.llvm.org/D66447#1638047>, @JDevlieghere > wrote: > > > In D66447#1637640 <https://reviews.llvm.org/D66447#1637640>, @labath wrote: > > > > > This looks good to me, but why are we using a nul character to test utf8 > > > support? Shouldn't we insert some funnier characters too? I mean, one of > > > the advantages of unicode is that it should not be affected by the system > > > code pages and such, so hopefully this would not cause problems even on > > > some more exotic setups. (And I am pretty sure I remember already seeing > > > some chinese chars in some of our data formatter tests) > > > > > > I only glanced at the proposal, but unless I misunderstand the type only > > fits UTF-8 characters representable in 1 byte, which are basically just > > ASCII. > > > I have now too glanced at the proposal (just the cppreference page, really :) > ). I think I understand where you got this impression from, but I don't think > that is fully correct. It is true that a *single* char8_t variable can hold > only 8 bit UTF8 code units (*not* characters), but that is not surprising > since UTF8 is a variable length encoding, so you can't have a type that > matches one character exactly. However, an *array* of char8_t is a completely > different thing, and I am pretty sure that these are intended to hold utf8 > strings containing any utf8 characters (otherwise, it wouldn't really deserve > to call itself a utf8 type), and so we should print (and test) it as regular > utf8. > > However, this actually surfaces the question of how should we format single > char8_t variables. It makes sense to display the character value if the value > happens to be ASCII, but I guess we shouldn't print something like "invalid > utf8 character" if it does contain one unit of the multibyte characters. You may find the the C++ Evolution Working Groups entry on [N4197 Adding u8 character literals, [tiny] Why no u8 character literals?](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4540.html#119) and the proposal that add char8_t <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0482r5.html> helpful in understanding the rationale and the proposal for `char8_t` runs through a lot of examples. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66447/new/ https://reviews.llvm.org/D66447 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits