Shifting an 'unsigned char' value left by 24 bits is undefined behaviour, if that value happens to be >= 128. (Because the 'unsigned char' value gets zero-extended to an 'int' first, and left-shift on signed integer types is undefined if it changes the sign.)
2024-10-04 Bruno Haible <br...@clisp.org> iconv_open: Fix undefined behaviour. Reported by Tim Sweet <tswee...@protonmail.com> at <https://savannah.gnu.org/bugs/?66289>. * lib/iconv.c (utf32be_mbtowc, utf32le_mbtowc): Cast 'unsigned char' values to ucs4_t before shifting them to the left. diff --git a/lib/iconv.c b/lib/iconv.c index 310f4043eb..f7a67798fb 100644 --- a/lib/iconv.c +++ b/lib/iconv.c @@ -195,7 +195,10 @@ utf32be_mbtowc (ucs4_t *pwc, const unsigned char *s, size_t n) { if (n >= 4) { - ucs4_t wc = (s[0] << 24) + (s[1] << 16) + (s[2] << 8) + s[3]; + ucs4_t wc = ((ucs4_t) s[0] << 24) + + ((ucs4_t) s[1] << 16) + + ((ucs4_t) s[2] << 8) + + (ucs4_t) s[3]; if (wc < 0x110000 && !(wc >= 0xd800 && wc < 0xe000)) { *pwc = wc; @@ -237,7 +240,10 @@ utf32le_mbtowc (ucs4_t *pwc, const unsigned char *s, size_t n) { if (n >= 4) { - ucs4_t wc = s[0] + (s[1] << 8) + (s[2] << 16) + (s[3] << 24); + ucs4_t wc = (ucs4_t) s[0] + + ((ucs4_t) s[1] << 8) + + ((ucs4_t) s[2] << 16) + + ((ucs4_t) s[3] << 24); if (wc < 0x110000 && !(wc >= 0xd800 && wc < 0xe000)) { *pwc = wc;