Hi Ben, Great work. But on this one I have a couple of comments:
- The functions u#_grapheme_len and u#_grapheme_next are redundant with each other: u8_grapheme_len (s, n) ::= (n == 0 ? 0 : (u8_grapheme_next (s, s + n) ? : end) - s). I would not offer two APIs that are _that_ similar for doing the same thing. (The functions in unistr.h do have similar redundancies, but there the rationale is similarity with the ISO C wchar_t API.) And the functions returning a pointer are more symmetric when you consider forward and backward iteration together. So my slight preference would be to remove the u#_grapheme_len functions and keep u#_grapheme_next instead. - Naming of functions: Now some functions have _grapheme_cluster_ in their name, some have _grapheme_ only. How about using either _grapheme_ consistently or _grapheme_cluster_ consistently? It's not necessary to rename the files and modules, though. - Code duplication: In lib/ (not in tests/) it is often possible to use the same algorithm for the 8-bit unit, 16-bit unit, and 32-bit unit cases. It helps future maintenance if the same algorithm is present in 1 file and not 3 files. Would you disagree if I use the de-duplication technique (with a parametrized .h file) on your new files? In the tests, unfortunately, this is often not possible. - Typo in unigbrk.h: 's/ ben / been /' Bruno