On 24 June 2017 at 19:12, Chris Vine <vine35792...@gmail.com> wrote:

> On Sat, 24 Jun 2017 19:08:36 +0100
> Chris Vine <vine35792...@gmail.com> wrote:
>
> > It is because UTF-8 is a multibyte encoding, and any one character may
> > require between 1 and 5 bytes to represent it.  If you were allowed to
> > change a byte at will you would be able to introduce invalid encoding
> > sequences.  As to the absense of documentation, maybe it is because
> > this was thought to be self-evident, dunno.
>
> And I should perhaps also make the point that these operators return a
> 32-bit unicode character, not a byte, which is consequent on the same
> point.  If you allowed mutation, the length of the string (in bytes)
> might change.



Right, of course. It does seem very obvious now. It seemed to completely
slip my mind that we're dealing with characters of arbitrary width, not
e.g. UTF-16. :( Thanks for the comprehensive answer to a stupid question!
_______________________________________________
gtkmm-list mailing list
gtkmm-list@gnome.org
https://mail.gnome.org/mailman/listinfo/gtkmm-list

Reply via email to