Hi Paolo, > Still, without safety u8_strmbtouc(puc, s) uses the same code as > u8_mbtouc(puc, s, SIZE_MAX), which makes pretty much my point. I think > it is safe and actually very useful to document u8_mbtouc/u16_mbtouc as > looking only one byte (resp. one short) beyond the first complete character.
I find it better to have clear specifications that the programmer can easily remember. The libunistring manual [1] states: "Argument pairs (s, n) denote a string s[0..n-1] with exactly n units." If we were to document "u8_mbtouc accesses only as many bytes as the first Unicode character makes up", the question immediately comes up: what about invalid and incomplete Unicode characters? Like { 0xC3 }, n = 1 or { 0xE4, 0x30 } n = 2. You see how such a definition quickly gets ambiguous. Such ambiguities later lead to bugs in the programs. Bruno [1] http://www.gnu.org/software/libunistring/manual/html_node/Conventions.html