Date:        Sat, 9 Aug 2025 01:35:17 +0200
    From:        Denys Vlasenko <dvlas...@redhat.com>
    Message-ID:  <f7ec2956-dfeb-af4e-d314-937f36424...@redhat.com>

  | A year ago, dash had this fix:

I would call that a change, rather than a fix.

  |      expand: Fix naked backslah leakage

What should happen in such a case all depends upon what
you think might have been intended.   An unquoted \ in
a glob pattern (at least not inside [] - there things would
be even trickier) is an escape char, it is intended to
escape what comes next.   What it means when followed
by something which doesn't need to be esaped, because it
already is quoted, is, I believe, unspecified, and you have
entered the realm of GIGO.

  |      Test case:
  |
  |              a="\\*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>"
  |
  |      Old result:
  |
  |              <>
  |
  |      New result:
  |
  |              <bc>

FWIW, the NetBSD sh, another ash descendant, like dash is, produces
the same result as (current) bash does here

        <\*bc>

That is, the pattern after the ## is invalid, as the unquoted, unescaped
\ that comes from $b has nothing to escape, so matches nothing.

If you want to interpret such a \ as meaning the same as \\ that's OK,
what happens is unspecified after all, but if you really meant to say \\
that's what you should be writing in the code - ie: as an author of
shell code, you shouldn't be creating invalid patterns.  Don't demand
that other shells act the same way as you expect when you do though.

The test would be slightly less confusing incidentally, if written
as

        a='\*bc'; b='\'; c='*'; echo "<${a##$b"$c"}>"

which should be identical - but using " to quote anything in
(any version of) a Bourne shell (derivative) rather than ' should
only really ever be done when there is something in the quoted
string which needs to be expanded (as in the two instances in the
arg to echo there), and sometimes when there is a ' in the string,
- using ' wherever possible avoids needing to think about when
any included \ characters need to be escaped and when they don't
(they always can be, but almost no-one writes "\\n" instead of
"\n" which both translate into '\n' due to the weird \ in "" rule).

Alternatively, use $'' quoting when no expansions are required,
then the \ rule is clear, if a \ is intended in the result, it
must always be written as \\.

Unfortunately many sh programmers are stuck with a C mentality, and
believe that ' is for characters, and " is for strings, or something
odd like that, and keep insisting on sticking "" around things that
would be better either unquoted completely (contain nothing requiring
quoting), or quoted using ''.

Your longer test does that better, and that's good.

  | I started creating a testcase for it, covering a few more possibilities.
  |
  | ... and discovered that bash, in fact, is probably buggy:

I suspect you really mean there "doesn't do what I would expect might
happen in some cases" where the pattern given is meaningless, and actually
produces (and should produce) unspecified results .. that is, whatever the
shell produces is OK (as glob patterns have a long history of never having
an error possibility - they just match, or don't match, if the pattern is
nonsense, the shell does whatever it likes - for users the correct behaviour
is to not write nonsense patterns in the first place.

I did a lot of work with the NetBSD sh with \ escaping in glob patterns
a few years ago, the results of what we produce with your longer test are
below, but there is just one case where we're different from bash:

a='\*bc'; b='\'; c='*';
echo '${a##$b} removes \:             '"|${a##$b}|"' - matches \'

where bash actually does what you believe ought happen, and the NetBSD
sh does not, the output from just that is:

${a##$b} removes \:             |\*bc| - matches \

-- another invalid pattern, matches nothing, so the result from
the NetBSD sh is the same as "${a}" the invalid pattern matches nothing.

I don't believe there are any *bugs* in bash here, just differences in
what particular behavoiur it should produce when that behaviour is unspecified
because the input is garbage.

kre

Results of your test from the current NetBSD sh (I think these would be the
same for any version from the past few years):

a is '\*bc'
b is '\'
c is '*'
${a##?*} removes everything:    ||
${a##?"*"} removes \*:          |bc| - matches one char, then *
${a##\*} removes nothing:       |\*bc| - first char is not *
${a##\\*} removes everything:   || - matches \, then all
${a##\\\*} removes \*:          || - matches \, then *
${a##?$c} removes everything:   || - matches one char, then all
${a##?"$c"} removes \*:         |bc| - matches one char, then *
${a##\\$c} removes everything:  || - matches \, then all
${a##\\"$c"} removes \*:        |bc| - matches \, then *
${a##$b} removes \:             |\*bc| - matches \
${a##"$b"} removes \:           |*bc| - matches \

${a##$b?} removes \*:           |\*bc| - matches \, then one char
${a##$b*} removes everything:   |\*bc| - matches \, then all
${a##$b$c} removes everything:  |\*bc| - matches \, then all
${a##$b"$c"} removes \*:        |\*bc| - matches \, then *

${a##"$b"?} removes \*:         |bc| - matches \, then one char
${a##"$b"*} removes everything: || - matches \, then all
${a##"$b""?"} removes nothing:  |\*bc| - second char is not ?
${a##"$b""*"} removes \*:       |bc| - matches \, then *
${a##"$b"\*} removes \*:        |bc| - matches \, then *
${a##"$b"$c} removes everything:|| - matches \, then all
${a##"$b""$c"} removes \*:      |bc| - matches \, then *
${a##"$b?"} removes nothing:    |\*bc| - second char is not ?
${a##"$b*"} removes \*:         |bc| - matches \, then *
${a##"$b$c"} removes \*:        |bc| - matches \, then *


Reply via email to