[CCing bug-gnulib to share the understanding about i18n issues] Pádraig Brady wrote on 13.05.2015: > MB_LEN_MAX was changed from 6 to 16 with: > https://sourceware.org/git/?p=glibc.git;a=commit;f=include/limits.h;h=d64b6ad075 > Do you know why the value 16 is used exactly?
This was motivated either by the desire to be completely future-proof for the next 30 years (and you don't know what kinds of encodings will be invented). Or because for a couple of months Ulrich Drepper & François Pinard were considering to add locales with stateful encodings such as ISO-2022-JP-2. This later turned out to be not worth the effort (as the user experience with filenames and shell in such locales was found to be terrible). > BTW I see MB_LEN_MAX is 4 on musl libc. The value of 4 is sufficient to accommodate all stateless encodings in use, including UTF-8 (which was restricted from max. 6 to 4 bytes by an ISO standard) and GB18030. But it's not necessarily future-proof. > I was worried that it implied that wctomb() might convert a wide char to > _multiple_ encoded chars > for some character/encoding combinations? No, neither POSIX nor glibc supports locales with encodings where a wide char would correspond to multiple characters or a where a character would correspond to multiple wide chars. In particular, this prevented EUC-JISX0213 from being used as a locale encoding in glibc [1], thus accelerating the move to UTF-8. > For example iso-2022-kr can have up to 7 bytes per encoded char, > so maybe wctomb() might output two of those for some wide chars, > and the extra two bytes were added for alignment? Yes, this was part of the considerations regarding stateful encodings. > Specifically why I'm wondering about this is to size the > output buffer for wctomb() appropriately. > Note the linux man page for wctomb() says to use MB_CUR_MAX, > while the freebsd man page says to use MB_LEN_MAX That's simply because MB_CUR_MAX is not a compile-time constant, and therefore for a long time the declaration of a local variable char buf[MB_CUR_MAX]; required GCC or C++, and the FreeBSD people are not keen adopters of GCC extensions. > I also asked this at: > http://stackoverflow.com/q/30222107/4421 Bruno [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=iconvdata/euc-jisx0213.c
