Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-21 Thread Charles-Henri Gros
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2
-fdebug-prefix-map=/build/bash-Dl674z/bash-5.0=.
-fstack-protector-strong -Wformat -Werror=format-security -Wall
-Wno-parentheses -Wno-format-security
uname output: Linux d-us6a-ubuntu-03 5.0.0-13-generic #14-Ubuntu SMP Mon
Apr 15 14:59:14 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-pc-linux-gnu

Bash Version: 5.0
Patch Level: 3
Release Status: release

Description:
Backslash mysteriously disappears in command expansion when
unescaping would reference an existing file

Repeat-By:
> touch a\$.class
> for i in $(echo "a\\\$.class"); do echo "$i"; done
a$.class
> rm a\$.class
> for i in $(echo "a\\\$.class"); do echo "$i"; done
a\$.class

The existence or not of the file should not have any effect.



Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Charles-Henri Gros
On 5/22/19 5:43 AM, Greg Wooledge wrote:
> On Wed, May 22, 2019 at 05:25:43PM +0700, Robert Elz wrote:
>> Date:Tue, 21 May 2019 22:11:20 +
>>     From:    Charles-Henri Gros 
>> Message-ID:  
>> 
>>
>>   | The existence or not of the file should not have any effect.
>>
>> But it does, and is intended to.   If the mattern matches a file
>> (when patyhname expanded as a result of the unquoted command substitution)
>> you get the file name produced.   If it does not match a file,
>> the pattern is left untouched.   That is the way that things are
>> supposed to work.
> With glob metacharacters, sure.  But none of the characters in his
> variable are glob metacharacters.
>
> There is definitely something weird happening here.
>
> wooledg:/tmp/x$ echo "$BASH_VERSION"
> 5.0.3(1)-release
> wooledg:/tmp/x$ touch 'a$.class'
> wooledg:/tmp/x$ i='a\$.class'; echo {$i} "{$i}"
> {a\$.class} {a\$.class}
> wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
> a$.class {a\$.class}
>
> Other versions of bash, plus ksh and dash, don't behave this way.
>
> wooledg:/tmp/x$ bash-2.05b
> wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> wooledg:/tmp/x$ bash-4.4
> wooledg:/tmp/x$ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> wooledg:/tmp/x$ ksh
> $ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> wooledg:/tmp/x$ dash
> $ i='a\$.class'; echo $i "{$i}"
> a\$.class {a\$.class}
>
> It seems to be unique to bash 5.  If it's a bug fix, then I'm not
> understanding the rationale.  Backslashes shouldn't be consumed during
> glob expansion.
>
> This is also not limited to $ alone.  It happens with letters too.
>
> wooledg:/tmp/x$ touch i
> wooledg:/tmp/x$ i='\i' j='\j'
> wooledg:/tmp/x$ echo $i $j
> i \j
>
> Standard disclaimers apply.  Stop using unquoted variables and these
> bugs will stop affecting you.  Nevertheless, Chet may want to take a
> peek.

What unquoted variables? Are you talking about the "$()" expansion?

The problem I'm trying to solve is to iterate over regex-escaped file
names obtained from a "find" command. I don't know how to make this
work. It works with other versions of bash and with other shells.

The original is closer to something like this:

for file in $(find ... | sed 's/\$/\\$/g'); do grep -e "$file"
someinput; done

It used to work. Now it doesn't. I do not know how to make it work again.


-- 
Charles-Henri Gros




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Charles-Henri Gros
On 5/22/19 10:47 AM, Greg Wooledge wrote:
> On Wed, May 22, 2019 at 05:34:22PM +0000, Charles-Henri Gros wrote:
>> On 5/22/19 5:43 AM, Greg Wooledge wrote:
>>> Standard disclaimers apply.  Stop using unquoted variables and these
>>> bugs will stop affecting you.  Nevertheless, Chet may want to take a
>>> peek.
>> What unquoted variables? Are you talking about the "$()" expansion?
> Yes.  I used a variable instead of a command substitution to make it
> easier to reproduce the problem.  Both have the same behavior in this
> case.

That's what I find a bit surprising (but shells are complicated, so
maybe this is right. All I know is that the code used to work). I didn't
think glob expansions applied to command expansions.

All I want here is word split (which is why I can't use quotes)

>
>> The problem I'm trying to solve is to iterate over regex-escaped file
>> names obtained from a "find" command. I don't know how to make this
>> work. It works with other versions of bash and with other shells.
> First step: do not "regex-escape" them, whatever that means.  Just use
> the actual filenames as printed by find -print0.
>
>> The original is closer to something like this:
>>
>> for file in $(find ... | sed 's/\$/\\$/g'); do grep -e "$file"
>> someinput; done
> Yeah, that's just the wrong approach.  It's also the first thing on
> the BashPitfalls page[1] (for a good reason).
>
> You have two choices here:
>
> 1) Use find -exec.
>
>find ... -exec grep -e someinput /dev/null {} +
>
> 2) Use find -print0 and a bash while read loop.  (NOT a for loop.)
>
>find ... -print0 |
>while IFS= read -rd '' file; do
>   something "$file"
>done
>
>(A variant of this uses < <() instead of a pipeline, so that the while
>loop runs in the main shell and variable assignments can persist.)
>
> Since you only show a simple grep as your action, find -exec is a better
> choice for this problem.  (Assuming you didn't fatally misrepresent the
> problem.)  Calling grep once for every file would be inefficient.

I don't think I fatally misrepresented the problem, however I do think
that you fatally misunderstood it (FWIW I know about -print0 and xargs -0)

The file name is the regex (argument to "-e"), not the file "grep"
reads. I want to check that some text file contains a reference to a file.

But it looks like this would work:

for file in $(find ...); do grep -e "$(echo -n "$file" | sed 's/\$/\\$/g')" 
someinput; done


-- 
Charles-Henri Gros




Re: Backslash mysteriously disappears in command expansion when unescaping would reference an existing file

2019-05-22 Thread Charles-Henri Gros
On 5/22/19 3:13 PM, Robert Elz wrote:
> Date:Wed, 22 May 2019 17:34:22 +
> From:    Charles-Henri Gros 
> Message-ID:  
> 
>
>   | The problem I'm trying to solve is to iterate over regex-escaped file
>   | names obtained from a "find" command. I don't know how to make this
>   | work. It works with other versions of bash and with other shells.
>
> You were relying upon a common bug, which has been fixed in bash, but
> your technique is all wrong, you don't need any kind of loop at all, not
> a for loop, and not the while read loop that Greg suggested.
>
> find -print produces a list of names, one per line.   Those are simple
> strings, which fgrep (or grep -F as Andreas suggested) can handle finding.
>
> What I'd do is
>
>   fgrep "$(find  -print)" wherever

Interesting, I didn't realize you could pass newline-separated patterns
to "grep" on the command line. Good to know for the future.

But unfortunately, grep was just illustrative, I'm using another tool
that takes a regex but has no "-F" option (though admittedly with some
effort I could add one, I wrote the tool in question).


>
> (You can use grep -F if you have an aversion to using its traditional name,
> but fgrep was once a different program to grep / egrep).
>
> This version will have a problem with filenames with embedded newlines,
> but so did your original, so I am simply assuming that you have none of
> those (using any variant of grep to search for strings containing newlines
> tends to be "difficult" as grep is a line at a time tool).
Yes I'm not expecting any special characters except "$".
>
> If you version of grep cannot handle the pattern list not having a
> terminating \n (the $() removes it) then you can add it back
>
>   fgrep "$(find ... -print)"$'\n' wherever.
>
> You're probably still going to need a | into sed inside the command
> substitution, as I doubt that you actually want to look for filenames
> in the format that find prints them (you have never shown your actual
> command) and I suspect that you want to delete the pathname component
> (a leading "./" or whatever) and it isn't clear what you want to
> happen with filenames in subdirectories.  But none of those manipulations
> will affect anything.
>
> The other difference between this method and the one that you were
> using, is that this one will mix up the output for all of the different
> file names (it reads the target files just once, looking for all of the
> filenames simultaneously) whereas your original scheme looked for each
> file name in the target sequentially (re-reading the target file(s) over
> and over again for each new file name).   That would group output lines
> for each file name together, whereas the technique above does not.

-- 
Charles-Henri Gros