Date: Sat, 9 Aug 2025 01:35:17 +0200 From: Denys Vlasenko <dvlas...@redhat.com> Message-ID: <f7ec2956-dfeb-af4e-d314-937f36424...@redhat.com>
| A year ago, dash had this fix: I would call that a change, rather than a fix. | expand: Fix naked backslah leakage What should happen in such a case all depends upon what you think might have been intended. An unquoted \ in a glob pattern (at least not inside [] - there things would be even trickier) is an escape char, it is intended to escape what comes next. What it means when followed by something which doesn't need to be esaped, because it already is quoted, is, I believe, unspecified, and you have entered the realm of GIGO. | Test case: | | a="\\*bc"; b="\\"; c="*"; echo "<${a##$b"$c"}>" | | Old result: | | <> | | New result: | | <bc> FWIW, the NetBSD sh, another ash descendant, like dash is, produces the same result as (current) bash does here <\*bc> That is, the pattern after the ## is invalid, as the unquoted, unescaped \ that comes from $b has nothing to escape, so matches nothing. If you want to interpret such a \ as meaning the same as \\ that's OK, what happens is unspecified after all, but if you really meant to say \\ that's what you should be writing in the code - ie: as an author of shell code, you shouldn't be creating invalid patterns. Don't demand that other shells act the same way as you expect when you do though. The test would be slightly less confusing incidentally, if written as a='\*bc'; b='\'; c='*'; echo "<${a##$b"$c"}>" which should be identical - but using " to quote anything in (any version of) a Bourne shell (derivative) rather than ' should only really ever be done when there is something in the quoted string which needs to be expanded (as in the two instances in the arg to echo there), and sometimes when there is a ' in the string, - using ' wherever possible avoids needing to think about when any included \ characters need to be escaped and when they don't (they always can be, but almost no-one writes "\\n" instead of "\n" which both translate into '\n' due to the weird \ in "" rule). Alternatively, use $'' quoting when no expansions are required, then the \ rule is clear, if a \ is intended in the result, it must always be written as \\. Unfortunately many sh programmers are stuck with a C mentality, and believe that ' is for characters, and " is for strings, or something odd like that, and keep insisting on sticking "" around things that would be better either unquoted completely (contain nothing requiring quoting), or quoted using ''. Your longer test does that better, and that's good. | I started creating a testcase for it, covering a few more possibilities. | | ... and discovered that bash, in fact, is probably buggy: I suspect you really mean there "doesn't do what I would expect might happen in some cases" where the pattern given is meaningless, and actually produces (and should produce) unspecified results .. that is, whatever the shell produces is OK (as glob patterns have a long history of never having an error possibility - they just match, or don't match, if the pattern is nonsense, the shell does whatever it likes - for users the correct behaviour is to not write nonsense patterns in the first place. I did a lot of work with the NetBSD sh with \ escaping in glob patterns a few years ago, the results of what we produce with your longer test are below, but there is just one case where we're different from bash: a='\*bc'; b='\'; c='*'; echo '${a##$b} removes \: '"|${a##$b}|"' - matches \' where bash actually does what you believe ought happen, and the NetBSD sh does not, the output from just that is: ${a##$b} removes \: |\*bc| - matches \ -- another invalid pattern, matches nothing, so the result from the NetBSD sh is the same as "${a}" the invalid pattern matches nothing. I don't believe there are any *bugs* in bash here, just differences in what particular behavoiur it should produce when that behaviour is unspecified because the input is garbage. kre Results of your test from the current NetBSD sh (I think these would be the same for any version from the past few years): a is '\*bc' b is '\' c is '*' ${a##?*} removes everything: || ${a##?"*"} removes \*: |bc| - matches one char, then * ${a##\*} removes nothing: |\*bc| - first char is not * ${a##\\*} removes everything: || - matches \, then all ${a##\\\*} removes \*: || - matches \, then * ${a##?$c} removes everything: || - matches one char, then all ${a##?"$c"} removes \*: |bc| - matches one char, then * ${a##\\$c} removes everything: || - matches \, then all ${a##\\"$c"} removes \*: |bc| - matches \, then * ${a##$b} removes \: |\*bc| - matches \ ${a##"$b"} removes \: |*bc| - matches \ ${a##$b?} removes \*: |\*bc| - matches \, then one char ${a##$b*} removes everything: |\*bc| - matches \, then all ${a##$b$c} removes everything: |\*bc| - matches \, then all ${a##$b"$c"} removes \*: |\*bc| - matches \, then * ${a##"$b"?} removes \*: |bc| - matches \, then one char ${a##"$b"*} removes everything: || - matches \, then all ${a##"$b""?"} removes nothing: |\*bc| - second char is not ? ${a##"$b""*"} removes \*: |bc| - matches \, then * ${a##"$b"\*} removes \*: |bc| - matches \, then * ${a##"$b"$c} removes everything:|| - matches \, then all ${a##"$b""$c"} removes \*: |bc| - matches \, then * ${a##"$b?"} removes nothing: |\*bc| - second char is not ? ${a##"$b*"} removes \*: |bc| - matches \, then * ${a##"$b$c"} removes \*: |bc| - matches \, then *