On Friday, 25 January 2019 13:39:49 PST Konstantin Tokarev wrote:
> > All living languages are supposed to be stored in the BMP, which means no
> > UTF-16 surrogate pairs to encode them.
> 
> AFAIK all emojis are encoded with surrogate pairs

Emojis are not part of a living language. They're drawings. But yes, they're 
outside the BMP.

In any case, they're often represented by more than one codepoint anyway, so 
whether we used N*2 UTF-16 code units to represent them or N UTF-32 code 
units, it makes no difference. Your code needs to know how to deal with them, 
where to properly break, how to combine them, how to calculate the width, etc.

Also note how they'd be represented by N*4 bytes in UTF-8, which means all 
three representations take exactly the same amount of memory.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



_______________________________________________
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development

Reply via email to