Re: BashPitfall 65, read reading past the delimiter on records ending in truncated characters

2025-04-21 Thread Chet Ramey

On 4/20/25 6:58 PM, Greg Wooledge wrote:


That one may be fixed, but:

bash-5.3$ printf 'FOO\0\315\0\226\0' | while IFS= read -rd '' f; do printf '<%q>\n' 
"$f"; done

<$'\315'>
<''>
<''>

The context for all of this was someone in IRC who was reading a chunk
of data from /dev/urandom and got different results with LC_CTYPE=C vs.
LC_CTYPE=en_US.utf8 (or other UTF-8 locale).  This is a simplified
reproducer.

In real-life scripts, this kind of thing could arise if someone reads
a NUL-delimited stream of pathnames from find -print0, or equivalent.


Yes, thanks for the report. The failure cases are somewhat constrained and
limited to invalid multibyte characters immediately followed by the
delimiter. I'll fix it for the next devel branch push and this will be a
part of bash-5.3-rc2.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: BashPitfall 65, read reading past the delimiter on records ending in truncated characters

2025-04-21 Thread Chet Ramey

On 4/21/25 2:48 AM, Stephane Chazelas wrote:

2025-04-20 17:31:56 -0400, Chet Ramey:
[...]

This has been fixed since last July, and the fix is in bash-5.3.

[...]

Thanks, though as Greg says, there seems to be a few more
related issues still affecting 5.3. I repost a message sent
privately below now that the discussion has been extended to the
mailing list.


The bug concerns unicode combining characters introducing
invalid unicode character sequences that happen to contain the
delimiter, and was reported privately.

[...]

That sentence doesn't seem to make sense to me.


Say you read a byte that introduces an (incomplete) multibyte character
(mbrtowc returns -2). Then you read the delimiter character, which changes
the incomplete multibyte character into an invalid one.

Instead of adding each byte of the invalid multibyte character to the input
string you're building, you need to perform the delimiter check against the
final character.

The original bug report happened to reproduce this entirely using combining
characters.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: Bash shell uses wrong language

2025-04-21 Thread Ángel
On 2025-04-14 at 10:59 -0400, Greg Wooledge wrote:
> Are bilingual but primarily-English-speaking end users expected to put
> "en@quot" as their preferred language in the LANGUAGE variable, in order
> to get messages in English?
> 
> Would creating an empty /usr/share/locale/en/LC_MESSAGES/bash.mo file
> make it work as expected?  (Tested: no, it does not.)

This works for me:
sudo ln -s /usr/share/locale/en@quot/LC_MESSAGES/bash.mo  
/usr/share/locale/en/LC_MESSAGES/bash.mo

I see it uses folders en_GB.utf8, en_US.utf8 and en_US.UTF-8 but ultimately 
fails, because there is no bash.mo for them

I think the issue here is that en generally has no translation file
because the C language is (generally) written in English, and then,
with no file 'en' is skipped.