Paul Eggert wrote: > > Based on the comments in gnulib/lib/mbrtoc16.c, I think it should better > > clear the first 24, not 12, bytes of the struct. Otherwise it can be in > > a state where mbsinit() returns true but the mbrto* functions have > > undefined behaviour. > > For mbcel all all that matters is mbrtoc32. Could you give an example of > the undefined behavior there? I looked at the citrus implementations in > current FreeBSD, OpenBSD and macOS and thought that 12 bytes is enough > for mbrtoc32 on all their porting targets. NetBSD is a bit different and > needs just a pointer width.
There's a difference between the part that mbsinit() looks at and the part that needs to be zeroed, to avoid undefined behaviour. For example, if we have a typedef struct { unsigned int count; unsigned int wchar; } mbstate_t; mbsinit() can return true if state->count == 0. But that does not mean that every state with state->count == 0 is valid. It is perfectly OK for mbrtowc() or mbrtoc32() (or other functions) to call abort() or to crash if state->count == 0 && state->wchar != 0. By reading the source code of FreeBSD, NetBSD, OpenBSD, macOS, Solaris, and so on, I can easily determine - which parts of the mbstate_t mbsinit() tests, - which parts of the mbstate_t the various functions use. But in order to understand what interdependencies there are, between the various mbstate_t fields, and what are the assumed invariants, I would need to carefully read each of the mentioned files (one per OS and per locale type). And this would not be future-proof: After changes in the bulk of code, the interdependencies and assumed invariants might not be the same any more. If we then have cleared too few fields of the mbstate_t, things might crash. Bruno