Long variable value get corrupted sometimes
Hi all, I encountered a problem that long variable value get corrupted sometimes. OS: Alpine linux 3.15.0 (docker container) Bash version: GNU bash, version 5.1.8(1)-release (x86_64-alpine-linux-musl) Reproduction steps: A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K. A test script read content from `foo.txt` and assign the content to a variable, and then check md5sum for that variable. ```bash #!/bin/bash FOO=$(cat /tmp/foo.txt) want_q_md5=$(cat /tmp/foo.txt.md5 | cut -d ' ' -f 1) got_md5=$(echo "$FOO" | md5sum -b | cut -d ' ' -f 1) if [[ "$got_md5" != "$want_q_md5" ]]; then echo "failed" echo "$FOO" > /tmp/foo-corrupt.txt fail=true else echo "succeed" fi ``` **Sometimes**, the md5sum check failed. Output variable value to `foo-corrupt.txt` when check fail, compare it with `foo.txt`, found that a random byte are inserted into multiple positions. I created a github repo for this issue, all the scripts are there: https://github.com/chanjarster/bash-5_1_8_long_var_corrupt -- Daniel Qian Apache Committer(chanjarster) blog:https://chanjarster.github.io github:https://github.com/chanjarster segmentfault: https://segmentfault.com/u/chanjarster
the "-e" command line argument is not recognized
runme.sh #!/bin/bash echo $0 echo $1 echo $2 echo $3 echo $4 echo $5 echo $6 command: ./runme.sh -q -w -e -r -t -y produced output: ./get_env.sh -q -w -r -t -y expected output: ./get_env.sh -q -w -e -r -t -y Regards, Viktor Korsun, bite...@gmail.com
Re: the "-e" command line argument is not recognized
On Feb 16 2022, Viktor Korsun wrote: > runme.sh > #!/bin/bash > echo $0 > echo $1 > echo $2 > echo $3 > echo $4 > echo $5 > echo $6 Don't use echo to print unknown text. Use printf instead, which can handle this correctly. Also, don't forget to quote properly. printf "%s\n" "$4" > > command: > ./runme.sh -q -w -e -r -t -y > > produced output: > ./get_env.sh > -q > -w > $ help echo echo: echo [-neE] [arg ...] Write arguments to the standard output. Display the ARGs, separated by a single space character and followed by a newline, on the standard output. Options: -ndo not append a newline -eenable interpretation of the following backslash escapes -Eexplicitly suppress interpretation of backslash escapes -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: the "-e" command line argument is not recognized
On Wed, Feb 16, 2022 at 9:59 AM Andreas Schwab wrote: > On Feb 16 2022, Viktor Korsun wrote: > > > runme.sh > > #!/bin/bash > > echo $0 > > echo $1 > > echo $2 > > echo $3 > > echo $4 > > echo $5 > > echo $6 > > Don't use echo to print unknown text. Use printf instead, which can > handle this correctly. Also, don't forget to quote properly. > > printf "%s\n" "$4" > printf '%s args\n' $# printf %s\\n "$@" > > > > > command: > > ./runme.sh -q -w -e -r -t -y > > > > produced output: > > ./get_env.sh > > -q > > -w > > > > $ help echo > echo: echo [-neE] [arg ...] > Write arguments to the standard output. > > Display the ARGs, separated by a single space character and followed > by a > newline, on the standard output. > > Options: > -ndo not append a newline > -eenable interpretation of the following backslash escapes > -Eexplicitly suppress interpretation of backslash escapes > > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > "And now for something completely different." > >
Re: Bash-5.2-alpha available
On Feb 10 2022, Chet Ramey wrote: > On 2/10/22 9:53 AM, Andreas Schwab wrote: >> On Jan 21 2022, Chet Ramey wrote: >> >>> i. The non-incremental history searches now leave the current history offset >>> at the position of the last matching history entry, like incremental >>> search. >> That makes history-search-backward significantly less useful, because >> you can no longer use yank-last-arg to copy arguments from the preceding >> line. > > It makes previous-history, next-history, and operate-and-get-next work as > they do with incremental searches, which is more in line with user > expectations. But it clobbers the matched history line, replacing it with the uncompleted input. $ HOME=$PWD bash --norc bash-5.2$ history 1 history bash-5.2$ echo 1 1 bash-5.2$ history 1 history 2 echo 1 3 history bash-5.2$ echo 1<-- type e 1 bash-5.2$ history 1 history 2 e 3 history 4 echo 1 5 history bash-5.2$ -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
Re: Long variable value get corrupted sometimes
Date:Wed, 16 Feb 2022 16:10:40 +0800 From:Daniel Qian Message-ID: | I encountered a problem that long variable valur get corrupteds | sometimes. That looks like the bug tgat is fixed by patch 14 to bash 5.1 Your bash is only at patch 8. Get all the released patches incorporated and try again. kre
Re: Long variable value get corrupted sometimes
On Wed, 16 Feb 2022 at 19:38, Daniel Qian wrote: > I encountered a problem that long variable value get corrupted sometimes. > A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K. > FOO=$(cat /tmp/foo.txt) Hi, this looks like something that was recently fixed, perhaps you can try this patch: https://savannah.gnu.org/patch/?10035
Re: Long variable value get corrupted sometimes
does the data contain \0 null bytes On Wed, Feb 16, 2022 at 11:20 AM David wrote: > On Wed, 16 Feb 2022 at 19:38, Daniel Qian wrote: > > > I encountered a problem that long variable value get corrupted sometimes. > > > A UTF-8 encoded file containing a lot of Chinese characters, file size > ~35K. > > > FOO=$(cat /tmp/foo.txt) > > Hi, this looks like something that was recently fixed, perhaps > you can try this patch: > https://savannah.gnu.org/patch/?10035 > >
Re: the "-e" command line argument is not recognized
On Wed, 16 Feb 2022 at 19:51, Viktor Korsun wrote: > produced output: > ./get_env.sh > -q > -w > > -r > -t > -y > > expected output: > ./get_env.sh > -q > -w > -e > -r > -t > -y Hi, this behaviour is well known history and widely discussed. You can search the web for "printf vs echo bash" and you will find plenty of information.
Re: the "-e" command line argument is not recognized
Thank you guys! Regards, Viktor Korsun, bite...@gmail.com On Wed, 16 Feb 2022 at 22:09, Alex fxmbsw7 Ratchev wrote: > > > On Wed, Feb 16, 2022 at 9:59 AM Andreas Schwab > wrote: > >> On Feb 16 2022, Viktor Korsun wrote: >> >> > runme.sh >> > #!/bin/bash >> > echo $0 >> > echo $1 >> > echo $2 >> > echo $3 >> > echo $4 >> > echo $5 >> > echo $6 >> >> Don't use echo to print unknown text. Use printf instead, which can >> handle this correctly. Also, don't forget to quote properly. >> >> printf "%s\n" "$4" >> > > printf '%s args\n' $# > printf %s\\n "$@" > > >> >> > >> > command: >> > ./runme.sh -q -w -e -r -t -y >> > >> > produced output: >> > ./get_env.sh >> > -q >> > -w >> > >> >> $ help echo >> echo: echo [-neE] [arg ...] >> Write arguments to the standard output. >> >> Display the ARGs, separated by a single space character and followed >> by a >> newline, on the standard output. >> >> Options: >> -ndo not append a newline >> -eenable interpretation of the following backslash escapes >> -Eexplicitly suppress interpretation of backslash escapes >> >> -- >> Andreas Schwab, sch...@linux-m68k.org >> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 >> "And now for something completely different." >> >>
Re: Long variable value get corrupted sometimes
On Wed, Feb 16, 2022 at 04:10:40PM +0800, Daniel Qian wrote: > FOO=$(cat /tmp/foo.txt) > got_md5=$(echo "$FOO" | md5sum -b | cut -d ' ' -f 1) In addition to what other people have said... echo is not reliable. It may alter your text, if you feed it backslashes, or arguments like "-e" or "-n". text=$(cat /tmp/foo.txt)# this strips all trailing newlines md5=$(printf '%s\n' "$text" | md5sum -b)# this adds one newline md5=${md5%% *} # this removes " *-" If you need to preserve the correct number of trailing newlines, then you'll also have to change the command substitution. The common workaround is: text=$(cat /tmp/foo.txt; printf x) text=${text%x} If you do this, remember that the final newline(s) are still inside the variable, so you don't need to add one: md5=$(printf %s "$text" | md5sum -b) Finally, you should get in the habit of NOT using all-caps variable names for regular shell variables. The all-caps namespace is "reserved" for environment variables (like HOME and PATH) and special shell variables (like BASH_VERSION and SECONDS). Ordinary variables that you use in a script should contain lowercase letters. Mixed caps/lowercase is fine, if you swing that way.
Re: Long variable value get corrupted sometimes
Le 16/02/2022 à 13:43, Greg Wooledge écrivait : text=$(cat /tmp/foo.txt; printf x) text=${text%x} or read -r -d '' text
Re: Long variable value get corrupted sometimes
On Wed, Feb 16, 2022 at 02:53:43PM +0100, Léa Gris wrote: > Le 16/02/2022 à 13:43, Greg Wooledge écrivait : > > text=$(cat /tmp/foo.txt; printf x) > > text=${text%x} > > or read -r -d '' text > witch saves a sub-shell You forgot IFS= there. Without that, it'll strip leading/trailing IFS whitespace. You also get a non-zero exit status from read when you use your approach, which will cause the script to die immediately if the author is using set -e. While some of us may consider that a benefit (breaking more set -e scripts raises awareness of how utterly horrible set -e is), there are still some misguided souls out there who might not see it as helpful just yet.
Re: Long variable value get corrupted sometimes
On 2/16/22 3:10 AM, Daniel Qian wrote: > Hi all, > > I encountered a problem that long variable value get corrupted sometimes. > > OS: Alpine linux 3.15.0 (docker container) > Bash version: GNU bash, version 5.1.8(1)-release (x86_64-alpine-linux-musl) > > Reproduction steps: > > A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K. > > A test script read content from `foo.txt` and assign the content to a > variable, > and then check md5sum for that variable. https://lists.gnu.org/archive/html/bug-bash/2022-01/msg9.html -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/
Re: Long variable value get corrupted sometimes
I'm not familiar with Bash version/release policy, I only found 5.1.8, 5.1.12, 5.1.16 at download page https://ftp.gnu.org/gnu/bash/ Is this fix included in 5.1.16 version? Chet Ramey 于2022年2月16日周三 21:59写道: > > On 2/16/22 3:10 AM, Daniel Qian wrote: > > Hi all, > > > > I encountered a problem that long variable value get corrupted sometimes. > > > > OS: Alpine linux 3.15.0 (docker container) > > Bash version: GNU bash, version 5.1.8(1)-release (x86_64-alpine-linux-musl) > > > > Reproduction steps: > > > > A UTF-8 encoded file containing a lot of Chinese characters, file size ~35K. > > > > A test script read content from `foo.txt` and assign the content to a > > variable, > > and then check md5sum for that variable. > > https://lists.gnu.org/archive/html/bug-bash/2022-01/msg9.html > > > -- > ``The lyf so short, the craft so long to lerne.'' - Chaucer > ``Ars longa, vita brevis'' - Hippocrates > Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/ -- Daniel Qian Apache Committer(chanjarster) blog:https://chanjarster.github.io github:https://github.com/chanjarster segmentfault: https://segmentfault.com/u/chanjarster
Re: Long variable value get corrupted sometimes
On Wed, Feb 16, 2022, at 8:27 PM, Daniel Qian wrote: > I'm not familiar with Bash version/release policy, I only found 5.1.8, > 5.1.12, 5.1.16 at > download page https://ftp.gnu.org/gnu/bash/ > > Is this fix included in 5.1.16 version? Yes, bash 5.1.16 is bash 5.1 with patch 16 and all previous official patches. -- vq
Re: Long variable value get corrupted sometimes
Thanks for your tips, a lot learned. Greg Wooledge 于2022年2月16日周三 20:47写道: > > On Wed, Feb 16, 2022 at 04:10:40PM +0800, Daniel Qian wrote: > > FOO=$(cat /tmp/foo.txt) > > got_md5=$(echo "$FOO" | md5sum -b | cut -d ' ' -f 1) > > In addition to what other people have said... echo is not reliable. It > may alter your text, if you feed it backslashes, or arguments like "-e" > or "-n". > > text=$(cat /tmp/foo.txt)# this strips all trailing > newlines > md5=$(printf '%s\n' "$text" | md5sum -b)# this adds one newline > md5=${md5%% *} # this removes " *-" > > If you need to preserve the correct number of trailing newlines, then > you'll also have to change the command substitution. The common > workaround is: > > text=$(cat /tmp/foo.txt; printf x) > text=${text%x} > > If you do this, remember that the final newline(s) are still inside the > variable, so you don't need to add one: > > md5=$(printf %s "$text" | md5sum -b) > > Finally, you should get in the habit of NOT using all-caps variable > names for regular shell variables. The all-caps namespace is "reserved" > for environment variables (like HOME and PATH) and special shell variables > (like BASH_VERSION and SECONDS). > > Ordinary variables that you use in a script should contain lowercase > letters. Mixed caps/lowercase is fine, if you swing that way. > -- Daniel Qian Apache Committer(chanjarster) blog:https://chanjarster.github.io github:https://github.com/chanjarster segmentfault: https://segmentfault.com/u/chanjarster