I've encountered what looks like a bug in mbrtowc's handling of UTF-8. Here's an example:
#include <stdio.h> #include <locale.h> #include <stdlib.h> #include <wchar.h> int main(void) { wchar_t wc; size_t ret; mbstate_t s = { 0 }; puts(setlocale(LC_CTYPE, "en_GB.UTF-8")); printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0)); printf("%i\n", mbrtowc(&wc, "\x94", 1, 0)); printf("%i\n", mbrtowc(&wc, "\x84", 1, 0)); printf("%x\n", wc); return 0; } The sequence E2 94 84 should translate to U+2514. Instead, the second and third calls to mbrtowc report encoding errors. It does work correctly if the three bytes are passed to mbrtowc() in one go: printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0)); Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple