On 10/5/21 4:41 AM, Koichi Murase wrote: > I have questions on the new feature ${var/pat/&} in the devel branch. > >> commit f188aa6a013e89d421e39354086eed513652b492 (upstream/devel) >> Author: Chet Ramey <chet.ra...@case.edu> >> Date: Mon Oct 4 15:30:21 2021 -0400 >> >> enable support for using `&' in the pattern substitution replacement >> string >> >> Any unquoted instances of & in STRING are replaced with the matching >> portion of PATTERN. Backslash is used to quote & in STRING; the >> backslash is removed in order to permit a literal & in the >> replacement string. Users should take care if STRING is >> double-quoted to avoid unwanted interactions between the backslash >> and double-quoting. Pattern substitution performs the check for & >> after expanding STRING; shell programmers should quote backslashes >> intended to escape the & and inhibit replacement so they survive any >> quote removal performed by the expansion of STRING. > > I would very much like this change introduced in the latest commit > f188aa6a in devel as it would enable many more string manipulations > with a simple construct, but I feel the current treatment of quoting > has problems: > > 1. There is no way to specify an arbitrary string in replacement in a > way that is compatible with both bash 5.1 and 5.2.
It's a change that assigns meaning to a character that was previously valid, not an error. It's probably going to require a shell option. > > 2. There is no way to insert a backslash before the matched part > (which I'd think would be one of the typical usages of &). This is quite reasonable, and a minor change. If the replacement function treats backslash specially by allowing it to quote `&', it should also allow it to escape a backslash. > > I below describe the details of each, followed by my suggestion or > discussion on an alternative design. > > ---------------------------------------------------------------------- > 1. How to specify an arbitrary string in replacement copatibly with > both bash 5.1 and 5.2? > > Currently any & in the replacement is replaced by the matched part > regardless of whether & is quoted in the parameter-expansion context > or not. Even the result of the parameter expansions and other > substitutions are subject to the special treatment of &, which makes > it non-trivial to specify an arbitrary string to the replacement > ${var/pat/rep}. The documentation goes into this in some detail, including specifying the expansions that REP undergoes. > $ str='X&Y&Z' pat='Y' rep='A&B' > $ echo ${str/$pat/XXXX} > X&A&B&Z > > where XXXX is some string that represents the literal "$rep" (i.e., > 'A&B'). A naive quoting of "$rep" does not work: > > $ echo "1:${str/$pat/"$rep"}" > 1:X&AYB&Z Wouldn't it be better to treat it in the standard way a double-quoted parameter expansion would be treated? The double-quoted expansion is already well-specified. People know how to get a backslash through double quoting, even in a context, like this one, where quote removal is performed. > > I would have expected it to work because $pat will lose special > meaning and be treated literally when it is quoted as "$pat". For > example, the glob patterns *?[ etc. and anchors # and % in $pat will > lose its special meaning when it is quoted: > > $ v='A' p='?'; echo "${v/$p/B}"; echo "${v/"$p"/B}" > B > A > $ v='A' p='#'; echo "${v/$p/B}"; echo "${v/"$p"/B}" > BA > A > $ v='A' p='%'; echo "${v/$p/B}"; echo "${v/"$p"/B}" > AB > A > > Of course, if $rep is not quoted, & in $rep is replaced by the matched > part. > > $ echo "2:${str/$pat/$rep}" > 2:X&AYB&Z > > * To properly specify an arbitrary string in the replacement, one > needs to replace all the characters. > > $ echo "${str/$pat/${rep//&/\\\\&}}" > > * When the replacement is not stored in a variable, one needs to > create a variable for the replacement, i.e., > > $ echo "${str/$pat/$(something)}" > > in Bash 5.1 needs to be converted to > > $ tmp=$(something) > $ echo "${str/$pat/${tmp//&/\\\\&}}" > > in Bash 5.2. > > * Also, there is no way of writing it so that it works in both Bash > 5.1 and 5.2. To make it work, one needs to switch the code > depending on the bash version as: > > if ((BASH_VERSINFO[0]*10000+BASH_VERSINFO[1]*100>=50200)); then > echo "${str/$pat/${rep//&/\\\\&}}" > else > echo "${str/$pat/$rep}" > fi > > [ Note: this does not work for the devel branch because the devel > branch still has the version 5.1. ] > > ---------------------------------------------------------------------- > 2. How to insert a literal backslash before the matched part? > > Another problem is that one cannot put a literal backslash just before > & without affecting the meaning of &. Currently if there is any > backslash before &, & will lose the special meaning and the two > characters '\&' become '&' after the replacement. I agree that just as \& allows a literal `&', \\ should be a literal backslash. > ---------------------------------------------------------------------- > Suggestion / Discussion > > I suggest that '&' has the meaning of the matched part only when it is > not quoted in the parameter-expansion context ${...} [ Note that > currently, '&' has the meaning of the matched part when it is not > quoted by backslash in *the expanded result* ]. I expect the > following interpretations with this suggestion: The quoting outside the ${...} doesn't affect whether REP is quoted. This is consistent with how POSIX specifies the pattern removal expansions, and how bash has worked since bash-4.3. So both of these, for instance, will expand to `&' *because of how bash already works*, regardless of whether or not we attach meaning to `&' in the replacement string. > $ echo "${var/$pat/&}" # & represents the matched part > $ echo "${var/$pat/\&}" # & is treated as a literal ampersand This next one will expand to `\&' again due to existing behavior, regardless of what we do with it, due to how quote removal works. And so on. > $ echo "${var/$pat/\\&}" # A literal backslash plus the matched part > $ echo "${var/$pat/'\'&}" # A literal backslash plus the matched part > $ rep='A&B' > $ echo "${var/$pat/$rep}" # 'A' plus the mached part plus 'B' > $ echo "${var/$pat/"$rep"}" # Literal 'A&B' Rather than dance around behind the scenes trying to invisibly quote &, but only in certain contexts where it would not otherwise be escaped by double quoting, I would be more in favor of adding an option to enable the feature and allowing the normal rules of double quoted strings to apply. > > Here are the rationale: > > * It is consistent with the treatment of the glob special characters > and anchors # and % in $pat of ${var/$pat}. Yeah, doing that was probably a mistake, but we have to live with it now. Those are really part of the pattern operator itself, not properties of the pattern. But nevertheless. > * One can intuitively quote & to make it a literal ampersand. The > distinction of the special & in ${var/$pat/&} and the literal > ampersand in ${var/$pat/\&} is more intuitive than ${var/$pat/&} vs > ${var/$pat/\\&}. Not if you take into account the word expansions the replacement string undergoes. For example, if you use ${var/$pat/\&} in bash-5.1, you're going to get a `&' in the output, not `\&'. Now you invite the questions of why bash expands things differently whether or not there is a `&' in the replacement string, and since the non-special bash-5.1 expanded that to `&', why should bash-5.2 not treat it as a replacement? I guess the question is why not let the normal shell word expansion rules apply, and work with the result. > ---------------------------------------------------------------------- > Bash version of devel branch? > > By the way, when would the BASH_VERSINFO be updated? The devel > version still has the Bash version 5.1. I would like to reference the > version information to switch the implementation. In particular, > since some incompatible changes are introduced in the devel branch > (which are supposed to be released as Bash 5.2), I need to switch the > implementation. That's what I do when I need to. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/