Re: utf8 and Glib::ustring

Daniel Boles Sun, 26 Mar 2017 13:12:41 -0700

On 22 March 2017 at 08:52, John Emmas <j...@creativepost.co.uk> wrote:


> Forgive my ignorance - this'll probably be obvious to some of you...
>
> Suppose I've got a simple character string, like this:-
>
>       const char* my_str = "Hello World";
>
> I can assign it to a Glib::ustring very easily:-
>
>       Glib::ustring ustr = my_str;
>
> BUT... instead of pointing to a 'normal' string (simple ASCII characters),
> let's suppose that 'my_str' was already pointing to a string in utf8
> format.  Will the same assignment still work - or is there some better way
> of assigning a utf8 string to a Glib::ustring?  Thanks,
>
> John
>


UTF-8 is backwards compatible with ASCII. If bit 7 of any given byte in a
string is 0, then that byte is treated as ASCII. Only if bit 7 is 1 do
UTF-8-compatible tools start interpreting the lower bits and the following
bytes differently.

In the same way, to Glib::ustring, any char* is just a block of bytes for
it to interpret as ASCII or as the extended set of characters supported by
UTF-8. (This typically manifests as different behaviour when getting the
string length, indexing, etc.: there is no longer a 1:1 correspondence
between size in bytes and length in characters when UTF-8 encoding is in
play.)

IOW, the answer to the question is yes, the same assignment will/must work,
and no, there is no better way: construct the Glib::ustring from the char*
and let it handle the rest.

_______________________________________________
gtkmm-list mailing list
gtkmm-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtkmm-list

Re: utf8 and Glib::ustring

Reply via email to