Re: BashPitfall 65, read reading past the delimiter on records ending in truncated characters
On 4/20/25 6:58 PM, Greg Wooledge wrote: That one may be fixed, but: bash-5.3$ printf 'FOO\0\315\0\226\0' | while IFS= read -rd '' f; do printf '<%q>\n' "$f"; done <$'\315'> <''> <''> The context for all of this was someone in IRC who was reading a chunk of data from /dev/urandom and got different results with LC_CTYPE=C vs. LC_CTYPE=en_US.utf8 (or other UTF-8 locale). This is a simplified reproducer. In real-life scripts, this kind of thing could arise if someone reads a NUL-delimited stream of pathnames from find -print0, or equivalent. Yes, thanks for the report. The failure cases are somewhat constrained and limited to invalid multibyte characters immediately followed by the delimiter. I'll fix it for the next devel branch push and this will be a part of bash-5.3-rc2. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: BashPitfall 65, read reading past the delimiter on records ending in truncated characters
On 4/21/25 2:48 AM, Stephane Chazelas wrote: 2025-04-20 17:31:56 -0400, Chet Ramey: [...] This has been fixed since last July, and the fix is in bash-5.3. [...] Thanks, though as Greg says, there seems to be a few more related issues still affecting 5.3. I repost a message sent privately below now that the discussion has been extended to the mailing list. The bug concerns unicode combining characters introducing invalid unicode character sequences that happen to contain the delimiter, and was reported privately. [...] That sentence doesn't seem to make sense to me. Say you read a byte that introduces an (incomplete) multibyte character (mbrtowc returns -2). Then you read the delimiter character, which changes the incomplete multibyte character into an invalid one. Instead of adding each byte of the invalid multibyte character to the input string you're building, you need to perform the delimiter check against the final character. The original bug report happened to reproduce this entirely using combining characters. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ OpenPGP_signature.asc Description: OpenPGP digital signature
Re: Bash shell uses wrong language
On 2025-04-14 at 10:59 -0400, Greg Wooledge wrote: > Are bilingual but primarily-English-speaking end users expected to put > "en@quot" as their preferred language in the LANGUAGE variable, in order > to get messages in English? > > Would creating an empty /usr/share/locale/en/LC_MESSAGES/bash.mo file > make it work as expected? (Tested: no, it does not.) This works for me: sudo ln -s /usr/share/locale/en@quot/LC_MESSAGES/bash.mo /usr/share/locale/en/LC_MESSAGES/bash.mo I see it uses folders en_GB.utf8, en_US.utf8 and en_US.UTF-8 but ultimately fails, because there is no bash.mo for them I think the issue here is that en generally has no translation file because the C language is (generally) written in English, and then, with no file 'en' is skipped.