On 18/07/10 16:23, Bruno Haible wrote: > Hi Pádraig, > >> However, the first byte of a multibyte >> UTF-8 char is the same for a lot of characters > > Yes. The last byte is equidistributed across the range 0x80..0xBF, whereas > the first byte is often the same. I'm applying the commit below to exploit it > for speed.
Nice one Bruno. Testing the interesting 2 and 3 byte cases shows an improvement of 10 and 15% respectively. cheers, Pádraig.