I think the question was about the performance impact of using UTF-16 as an internal representation of characters.
The original claim was in effect that the encoding conversion to UTF-16 is so costly that it offsets any gain of doing codepoint operations on UTF-16 instead of UTF-8. It is a very strong claim because experiments so far have proven the opposite. I think the statement against ICU/UTF16 needs to be backed by experimental data. Benjamin On 10/6/13, 12:31 PM, Alp Toker wrote: > Geoffrey, http://userguide.icu-project.org/conversion/converters says: > > "Since ICU uses Unicode (UTF-16) internally, all converters convert > between UTF-16 (with the endianness according to the current platform) > and another encoding." > > That said, I don't think it's a major concern because ICU works on byte > streams. It's not like these strings will persist internally somewhere > eating lots of memory. > > From experience, the old WTF in-place converters found in WebKit > "mobile" ports of past were way-buggy and probably only ever tested with > ASCII. I'd say use ICU and don't look back :-) > > Alp. > > > On 06/10/2013 20:08, Geoffrey Garen wrote: >>> There is an issue with ICU: it uses UTF16 as its internal representation, >>> while most of the Web nowadays is UTF8. Therefore, page text goes through >>> unnecessary encoding conversion, and takes more memory than in UTF8 (for >>> most of languages). So it might be not a good development direction to tie >>> up WebKit to ICU. >> Is there a benchmark or website that can verify these claims? >> >> Thanks, >> Geoff >> _______________________________________________ >> webkit-dev mailing list >> [email protected] >> https://lists.webkit.org/mailman/listinfo/webkit-dev > _______________________________________________ webkit-dev mailing list [email protected] https://lists.webkit.org/mailman/listinfo/webkit-dev

