On Sat, Nov 11, 2006 at 10:48:44PM +0200, Riku Voipio wrote: > All have a utf-8 character in WORDCHARS. I didn't > spot anything obvious in WORDCHARS parsing which could > break only on arm.
Ah, that is indeed the key, or so it seems. hunspell converts UTF-8 to UCS-2 (it says "UTF-16", but it really seems to be UCS-2), using a function with this prototype: int u8_u16(w_char * dest, int size, const char * src) { It then goes on doing stuff like u2->h = (*u8 & 0x1f) >> 2; u2->l = (*u8 << 6) + (*(u8+1) & 0x3f); u8++; Now, consider the fact that char is an unsigned type on arm, and signed on most other platforms, and I guess we have the source of our bug. (Now, I have no idea why why hunspell has its own definitions and functions instead of using the existing wchar_t type and functions like mbstowcs, but I'm not going to change that.) I'll be sure to fix the locale bug too in the same upload, although I'm not bothering to file it as a separate bug. /* Steinar */ -- Homepage: http://www.sesse.net/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]