labath added a comment. In D66447#1638047 <https://reviews.llvm.org/D66447#1638047>, @JDevlieghere wrote:
> In D66447#1637640 <https://reviews.llvm.org/D66447#1637640>, @labath wrote: > > > This looks good to me, but why are we using a nul character to test utf8 > > support? Shouldn't we insert some funnier characters too? I mean, one of > > the advantages of unicode is that it should not be affected by the system > > code pages and such, so hopefully this would not cause problems even on > > some more exotic setups. (And I am pretty sure I remember already seeing > > some chinese chars in some of our data formatter tests) > > > I only glanced at the proposal, but unless I misunderstand the type only fits > UTF-8 characters representable in 1 byte, which are basically just ASCII. I have now too glanced at the proposal (just the cppreference page, really :) ). I think I understand where you got this impression from, but I don't think that is fully correct. It is true that a *single* char8_t variable can hold only 8 bit UTF8 code units (*not* characters), but that is not surprising since UTF8 is a variable length encoding, so you can't have a type that matches one character exactly. However, an *array* of char8_t is a completely different thing, and I am pretty sure that these are intended to hold utf8 strings containing any utf8 characters (otherwise, it wouldn't really deserve to call itself a utf8 type), and so we should print (and test) it as regular utf8. However, this actually surfaces the question of how should we format single char8_t variables. It makes sense to display the character value if the value happens to be ASCII, but I guess we shouldn't print something like "invalid utf8 character" if it does contain one unit of the multibyte characters. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D66447/new/ https://reviews.llvm.org/D66447 _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits