Long variable value get corrupted sometimes

2022-02-16 Thread Daniel Qian
Hi all,

I encountered a problem that long variable value get corrupted sometimes.

OS: Alpine linux 3.15.0 (docker container)
Bash version: GNU bash, version 5.1.8(1)-release (x86_64-alpine-linux-musl)

Reproduction steps:

A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K.

A test script read content from `foo.txt` and assign the content to a variable,
and then check md5sum for that variable.

```bash
#!/bin/bash

FOO=$(cat /tmp/foo.txt)
want_q_md5=$(cat /tmp/foo.txt.md5 | cut -d ' ' -f 1)
got_md5=$(echo "$FOO" | md5sum -b | cut -d ' ' -f 1)
if [[ "$got_md5" != "$want_q_md5" ]]; then
  echo "failed"
  echo "$FOO" > /tmp/foo-corrupt.txt
  fail=true
else
  echo "succeed"
fi
```

**Sometimes**, the md5sum check failed.

Output variable value to `foo-corrupt.txt` when check fail, compare it
with `foo.txt`,
found that a random byte are inserted into multiple positions.

I created a github repo for this issue, all the scripts are there:
https://github.com/chanjarster/bash-5_1_8_long_var_corrupt

-- 
Daniel Qian
Apache Committer(chanjarster)
blog:https://chanjarster.github.io
github:https://github.com/chanjarster
segmentfault: https://segmentfault.com/u/chanjarster



the "-e" command line argument is not recognized

2022-02-16 Thread Viktor Korsun
runme.sh
#!/bin/bash
echo $0
echo $1
echo $2
echo $3
echo $4
echo $5
echo $6

command:
./runme.sh -q -w -e -r -t -y

produced output:
./get_env.sh
-q
-w

-r
-t
-y

expected output:
./get_env.sh
-q
-w
-e
-r
-t
-y


Regards,
Viktor Korsun, bite...@gmail.com


Re: the "-e" command line argument is not recognized

2022-02-16 Thread Andreas Schwab
On Feb 16 2022, Viktor Korsun wrote:

> runme.sh
> #!/bin/bash
> echo $0
> echo $1
> echo $2
> echo $3
> echo $4
> echo $5
> echo $6

Don't use echo to print unknown text.  Use printf instead, which can
handle this correctly.  Also, don't forget to quote properly.

printf "%s\n" "$4"

>
> command:
> ./runme.sh -q -w -e -r -t -y
>
> produced output:
> ./get_env.sh
> -q
> -w
>

$ help echo
echo: echo [-neE] [arg ...]
Write arguments to the standard output.

Display the ARGs, separated by a single space character and followed by a
newline, on the standard output.

Options:
  -ndo not append a newline
  -eenable interpretation of the following backslash escapes
  -Eexplicitly suppress interpretation of backslash escapes

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: the "-e" command line argument is not recognized

2022-02-16 Thread Alex fxmbsw7 Ratchev
On Wed, Feb 16, 2022 at 9:59 AM Andreas Schwab 
wrote:

> On Feb 16 2022, Viktor Korsun wrote:
>
> > runme.sh
> > #!/bin/bash
> > echo $0
> > echo $1
> > echo $2
> > echo $3
> > echo $4
> > echo $5
> > echo $6
>
> Don't use echo to print unknown text.  Use printf instead, which can
> handle this correctly.  Also, don't forget to quote properly.
>
> printf "%s\n" "$4"
>

printf '%s args\n' $#
printf %s\\n "$@"


>
> >
> > command:
> > ./runme.sh -q -w -e -r -t -y
> >
> > produced output:
> > ./get_env.sh
> > -q
> > -w
> >
>
> $ help echo
> echo: echo [-neE] [arg ...]
> Write arguments to the standard output.
>
> Display the ARGs, separated by a single space character and followed
> by a
> newline, on the standard output.
>
> Options:
>   -ndo not append a newline
>   -eenable interpretation of the following backslash escapes
>   -Eexplicitly suppress interpretation of backslash escapes
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>
>


Re: Bash-5.2-alpha available

2022-02-16 Thread Andreas Schwab
On Feb 10 2022, Chet Ramey wrote:

> On 2/10/22 9:53 AM, Andreas Schwab wrote:
>> On Jan 21 2022, Chet Ramey wrote:
>> 
>>> i. The non-incremental history searches now leave the current history offset
>>> at the position of the last matching history entry, like incremental 
>>> search.
>> That makes history-search-backward significantly less useful, because
>> you can no longer use yank-last-arg to copy arguments from the preceding
>> line.
>
> It makes previous-history, next-history, and operate-and-get-next work as
> they do with incremental searches, which is more in line with user
> expectations.

But it clobbers the matched history line, replacing it with the
uncompleted input.

$ HOME=$PWD bash --norc
bash-5.2$ history
1  history
bash-5.2$ echo 1
1
bash-5.2$ history
1  history
2  echo 1
3  history
bash-5.2$ echo 1<-- type e 
1
bash-5.2$ history
1  history
2  e
3  history
4  echo 1
5  history
bash-5.2$ 

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Robert Elz
Date:Wed, 16 Feb 2022 16:10:40 +0800
From:Daniel Qian 
Message-ID:  


  | I encountered a problem that long variable valur get corrupteds
  | sometimes.

That looks like the bug tgat is fixed by patch 14 to bash 5.1
Your bash is only at patch 8.  Get all the released patches
incorporated and try again.

kre



Re: Long variable value get corrupted sometimes

2022-02-16 Thread David
On Wed, 16 Feb 2022 at 19:38, Daniel Qian  wrote:

> I encountered a problem that long variable value get corrupted sometimes.

> A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K.

> FOO=$(cat /tmp/foo.txt)

Hi, this looks like something that was recently fixed, perhaps
you can try this patch:
  https://savannah.gnu.org/patch/?10035



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Alex fxmbsw7 Ratchev
does the data contain \0 null bytes

On Wed, Feb 16, 2022 at 11:20 AM David  wrote:

> On Wed, 16 Feb 2022 at 19:38, Daniel Qian  wrote:
>
> > I encountered a problem that long variable value get corrupted sometimes.
>
> > A UTF-8 encoded file containing a lot of Chinese characters, file size
> ~35K.
>
> > FOO=$(cat /tmp/foo.txt)
>
> Hi, this looks like something that was recently fixed, perhaps
> you can try this patch:
>   https://savannah.gnu.org/patch/?10035
>
>


Re: the "-e" command line argument is not recognized

2022-02-16 Thread David
On Wed, 16 Feb 2022 at 19:51, Viktor Korsun  wrote:

> produced output:
> ./get_env.sh
> -q
> -w
>
> -r
> -t
> -y
>
> expected output:
> ./get_env.sh
> -q
> -w
> -e
> -r
> -t
> -y

Hi, this behaviour is well known history and widely discussed.
You can search the web for "printf vs echo bash" and you
will find plenty of information.



Re: the "-e" command line argument is not recognized

2022-02-16 Thread Viktor Korsun
Thank you guys!

Regards,
Viktor Korsun, bite...@gmail.com


On Wed, 16 Feb 2022 at 22:09, Alex fxmbsw7 Ratchev 
wrote:

>
>
> On Wed, Feb 16, 2022 at 9:59 AM Andreas Schwab 
> wrote:
>
>> On Feb 16 2022, Viktor Korsun wrote:
>>
>> > runme.sh
>> > #!/bin/bash
>> > echo $0
>> > echo $1
>> > echo $2
>> > echo $3
>> > echo $4
>> > echo $5
>> > echo $6
>>
>> Don't use echo to print unknown text.  Use printf instead, which can
>> handle this correctly.  Also, don't forget to quote properly.
>>
>> printf "%s\n" "$4"
>>
>
> printf '%s args\n' $#
> printf %s\\n "$@"
>
>
>>
>> >
>> > command:
>> > ./runme.sh -q -w -e -r -t -y
>> >
>> > produced output:
>> > ./get_env.sh
>> > -q
>> > -w
>> >
>>
>> $ help echo
>> echo: echo [-neE] [arg ...]
>> Write arguments to the standard output.
>>
>> Display the ARGs, separated by a single space character and followed
>> by a
>> newline, on the standard output.
>>
>> Options:
>>   -ndo not append a newline
>>   -eenable interpretation of the following backslash escapes
>>   -Eexplicitly suppress interpretation of backslash escapes
>>
>> --
>> Andreas Schwab, sch...@linux-m68k.org
>> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
>> "And now for something completely different."
>>
>>


Re: Long variable value get corrupted sometimes

2022-02-16 Thread Greg Wooledge
On Wed, Feb 16, 2022 at 04:10:40PM +0800, Daniel Qian wrote:
> FOO=$(cat /tmp/foo.txt)
> got_md5=$(echo "$FOO" | md5sum -b | cut -d ' ' -f 1)

In addition to what other people have said... echo is not reliable.  It
may alter your text, if you feed it backslashes, or arguments like "-e"
or "-n".

text=$(cat /tmp/foo.txt)# this strips all trailing newlines
md5=$(printf '%s\n' "$text" | md5sum -b)# this adds one newline
md5=${md5%% *}  # this removes " *-"

If you need to preserve the correct number of trailing newlines, then
you'll also have to change the command substitution.  The common
workaround is:

text=$(cat /tmp/foo.txt; printf x)
text=${text%x}

If you do this, remember that the final newline(s) are still inside the
variable, so you don't need to add one:

md5=$(printf %s "$text" | md5sum -b)

Finally, you should get in the habit of NOT using all-caps variable
names for regular shell variables.  The all-caps namespace is "reserved"
for environment variables (like HOME and PATH) and special shell variables
(like BASH_VERSION and SECONDS).

Ordinary variables that you use in a script should contain lowercase
letters.  Mixed caps/lowercase is fine, if you swing that way.



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Léa Gris

Le 16/02/2022 à 13:43, Greg Wooledge écrivait :

text=$(cat /tmp/foo.txt; printf x)
text=${text%x}


or read -r -d '' text 

Re: Long variable value get corrupted sometimes

2022-02-16 Thread Greg Wooledge
On Wed, Feb 16, 2022 at 02:53:43PM +0100, Léa Gris wrote:
> Le 16/02/2022 à 13:43, Greg Wooledge écrivait :
> > text=$(cat /tmp/foo.txt; printf x)
> > text=${text%x}
> 
> or read -r -d '' text  
> witch saves a sub-shell

You forgot IFS= there.  Without that, it'll strip leading/trailing IFS
whitespace.

You also get a non-zero exit status from read when you use your approach,
which will cause the script to die immediately if the author is using
set -e.  While some of us may consider that a benefit (breaking more
set -e scripts raises awareness of how utterly horrible set -e is),
there are still some misguided souls out there who might not see it as
helpful just yet.



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Chet Ramey
On 2/16/22 3:10 AM, Daniel Qian wrote:
> Hi all,
> 
> I encountered a problem that long variable value get corrupted sometimes.
> 
> OS: Alpine linux 3.15.0 (docker container)
> Bash version: GNU bash, version 5.1.8(1)-release (x86_64-alpine-linux-musl)
> 
> Reproduction steps:
> 
> A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K.
> 
> A test script read content from `foo.txt` and assign the content to a 
> variable,
> and then check md5sum for that variable.

https://lists.gnu.org/archive/html/bug-bash/2022-01/msg9.html


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Daniel Qian
I'm not familiar with Bash version/release policy, I only found 5.1.8,
5.1.12, 5.1.16 at
download page https://ftp.gnu.org/gnu/bash/

Is this fix included in 5.1.16 version?

Chet Ramey  于2022年2月16日周三 21:59写道:
>
> On 2/16/22 3:10 AM, Daniel Qian wrote:
> > Hi all,
> >
> > I encountered a problem that long variable value get corrupted sometimes.
> >
> > OS: Alpine linux 3.15.0 (docker container)
> > Bash version: GNU bash, version 5.1.8(1)-release (x86_64-alpine-linux-musl)
> >
> > Reproduction steps:
> >
> > A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K.
> >
> > A test script read content from `foo.txt` and assign the content to a 
> > variable,
> > and then check md5sum for that variable.
>
> https://lists.gnu.org/archive/html/bug-bash/2022-01/msg9.html
>
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



-- 
Daniel Qian
Apache Committer(chanjarster)
blog:https://chanjarster.github.io
github:https://github.com/chanjarster
segmentfault: https://segmentfault.com/u/chanjarster



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Lawrence Velázquez
On Wed, Feb 16, 2022, at 8:27 PM, Daniel Qian wrote:
> I'm not familiar with Bash version/release policy, I only found 5.1.8,
> 5.1.12, 5.1.16 at
> download page https://ftp.gnu.org/gnu/bash/
>
> Is this fix included in 5.1.16 version?

Yes, bash 5.1.16 is bash 5.1 with patch 16 and all previous official
patches.

-- 
vq



Re: Long variable value get corrupted sometimes

2022-02-16 Thread Daniel Qian
Thanks for your tips, a lot learned.

Greg Wooledge  于2022年2月16日周三 20:47写道:

>
> On Wed, Feb 16, 2022 at 04:10:40PM +0800, Daniel Qian wrote:
> > FOO=$(cat /tmp/foo.txt)
> > got_md5=$(echo "$FOO" | md5sum -b | cut -d ' ' -f 1)
>
> In addition to what other people have said... echo is not reliable.  It
> may alter your text, if you feed it backslashes, or arguments like "-e"
> or "-n".
>
> text=$(cat /tmp/foo.txt)# this strips all trailing 
> newlines
> md5=$(printf '%s\n' "$text" | md5sum -b)# this adds one newline
> md5=${md5%% *}  # this removes " *-"
>
> If you need to preserve the correct number of trailing newlines, then
> you'll also have to change the command substitution.  The common
> workaround is:
>
> text=$(cat /tmp/foo.txt; printf x)
> text=${text%x}
>
> If you do this, remember that the final newline(s) are still inside the
> variable, so you don't need to add one:
>
> md5=$(printf %s "$text" | md5sum -b)
>
> Finally, you should get in the habit of NOT using all-caps variable
> names for regular shell variables.  The all-caps namespace is "reserved"
> for environment variables (like HOME and PATH) and special shell variables
> (like BASH_VERSION and SECONDS).
>
> Ordinary variables that you use in a script should contain lowercase
> letters.  Mixed caps/lowercase is fine, if you swing that way.
>


--
Daniel Qian
Apache Committer(chanjarster)
blog:https://chanjarster.github.io
github:https://github.com/chanjarster
segmentfault: https://segmentfault.com/u/chanjarster