Re: [PATCH 0/5] Improve text protocol

Bill Spitzak Tue, 16 Apr 2013 10:59:00 -0700

There seem to be some claims that you cannot random-access a UTF-8string with errors in it. This is false if you define the handling oferrors to strict patterns that do not contain valid encodings, andeasiest with my recommendation that errors only be 1 byte long.

To make this sample code simple, the buffer has a 0 byte before thefirst actual byte and another after the last one, this avoids the needto pass the buffer ends to the functions. Real implementations may need

to pass the pointer to one or both ends.


// Returns the length of a UTF-8 code point starting at p,
// or returns 0 if it is not a valid encoding. The rest of this
// code treats 0 as a 1-byte-long "code point"
int utf8_length(const unsigned char* p)
{
  if (p < 0x80) return 1; // ASCII
  else if (p < 0xC2) return 0; // continuation and overlong
  else ... // multi-byte codes
}

// return the start of the UTF-8 code point that
// p is pointing at one of the bytes of.
const unsigned char* utf8_start(const unsigned char* p)
{
  for (int i = 0; i < 4; i++)
     if (utf8_length(p-i) > i) return p-i;
  return p;
}

// p is assumed to point at the start of a code point, return the next
// one, or the 0 off the end of the buffer
const unsigned char* utf8_next(const unsigned char* p)
{
  int n = utf8_length(p);
  return p + (n ? n : 1);
}

// p is assumed to point at the start of a code point, return the
// previous one, or the 0 before the start of the buffer
const unsigned char* utf8_prev(const unsigned char* p)
{
  return utf8_start(p-1);
}
_______________________________________________
wayland-devel mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/wayland-devel

Re: [PATCH 0/5] Improve text protocol

Reply via email to