[Lldb-commits] [PATCH] D66447: Add char8_t support (C++20)

Pavel Labath via Phabricator via lldb-commits Tue, 20 Aug 2019 23:50:39 -0700

labath added a comment.

In D66447#1638047 <https://reviews.llvm.org/D66447#1638047>, @JDevlieghere 
wrote:

> In D66447#1637640 <https://reviews.llvm.org/D66447#1637640>, @labath wrote:
>
> > This looks good to me, but why are we using a nul character to test utf8 
> > support? Shouldn't we insert some funnier characters too? I mean, one of 
> > the advantages of unicode is that it should not be affected by the system 
> > code pages and such, so hopefully this would not cause problems even on 
> > some more exotic setups. (And I am pretty sure I remember already seeing 
> > some chinese chars in some of our data formatter tests)
>
>
> I only glanced at the proposal, but unless I misunderstand the type only fits 
> UTF-8 characters representable in 1 byte, which are basically just ASCII.

I have now too glanced at the proposal (just the cppreference page, really :) 
). I think I understand where you got this impression from, but I don't think 
that is fully correct. It is true that a *single* char8_t variable can hold 
only 8 bit UTF8 code units (*not* characters), but that is not surprising since 
UTF8 is a variable length encoding, so you can't have a type that matches one 
character exactly. However, an *array* of char8_t is a completely different 
thing, and I am pretty sure that these are intended to hold utf8 strings 
containing any utf8 characters (otherwise, it wouldn't really deserve to call 
itself a utf8 type), and so we should print (and test) it as regular utf8.

However, this actually surfaces the question of how should we format single 
char8_t variables. It makes sense to display the character value if the value 
happens to be ASCII, but I guess we shouldn't print something like "invalid 
utf8 character" if it does contain one unit of the multibyte characters.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66447/new/

https://reviews.llvm.org/D66447

_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [PATCH] D66447: Add char8_t support (C++20)

Reply via email to