Date: Mon, 25 Feb 2019 17:38:07 -0500 From: Grisha Levit <grishale...@gmail.com> Message-ID: <CAMu=broqi+u5m+spu-rjafs9k7ovy0jrm5df2obekgapwij...@mail.gmail.com>
First, apologies from me for missing this message from you. I don't know if my spam filters caught it (for some unknown reason) or whether it was delivered and I simply discarded it without noticing that it was real mail and not an example of spam that made it past my spam filters. I have cleaned out my spam folder between when you sent this and now, so I can't simply go look there to find out (if it had been filtered, it would have gone there, but if it was my eyeballs failing, then it would have just been removed ...) I had no idea your reply existed until I saw Chet's reply to you, and went hunting (fortunately I keep many backups of all mail received, including spam, so I could go hunting, using the referenced message-id in his message, to find where your message appeared, and recover it.) | Yup I was referencing the devel version that fixed the ${b+s ''} issue. Oh. I didn't know there had been a fix for that. Great. | I think I'm missing something but how can that be the case regarding | the quoting? It all relates to the way the original Bourne shell (1978/9 vintage, on a pdp-11 ... so very little code or data space) parsed quoting. The rule was basically very simple, each (unquoted) " (and same for ') in the input byte stream simply toggled the "quoted" flag - and this was done very early in the input data processing. Every other character (except \ or course, which simply quoted the following char) was combined with the "quoted" flag to form the character that existed internally (the "quoted" flag was 0x80, and "combined" meant (ch|quoted). (The few chars with special meaning insiode "" were handled differently, but I don't remember exactly how right now.) The effect of this is that a quoted '*' for example would not compare equal to an unquoted '*' (one is 0x2A, the other 0xAA), so quoting the magic chars for glob simply worked (glob looks only for 2A, an unquoted '*'). When comparing normal chars for equality the quote flag was simply stripped (if ((ch & 0x7f) == *p) ... or similar). Similarly, IFS can only contain unquoted characters (quote removal happens -- which at the time that meant &= 0xFF -- before the value is assigned), so field splitting never worked on a quoted char, only on an unquoted one. This allowed all kinds of simplifications to the code, so it could be implemented in very little space. Unfortunately it also meant that we ended up with the weird spec for "${var+w"or"d}" where the "or" are not quoted, but the 'w' and 'd' are - it is also what leads to "${var+w'or'd)}" (when var is set) expanding to "w'or'd" as the ' characters are still quoted by the "" and are not quoting chars themselves. When Korn first implemented ksh (so I am told, I have never seen a ksh this old) he fixed that, and made the quoting context for word in those 4 expansions, as well as the new substring extraction expansions he invented, be unrelated to the quoting context surrounding the expansion. But, so I have been told, we was convinced that was incompatible, and so changed the 4 original operators - + = and ?, back, so they were processed the same way that the original Bourne shell did them, but left the substring operators (% %% # and ##) the new way. Until relatively recently, that was the supposed state of the world, and what POSIX demanded. But in the interim, several shells have been convinced by their users (or never understood in the first place perhaps for some of them) that it is (was) supposed to work this way, and made the implementation be more like what users expect, rather than what the original Bourne shell implemented. Eventually POSIX, which is supposed to be telling script writers what they should expect to work in the wild, had to relent, and make this be an unspecified case, as ... | For example "${x+" a b "}" expands to a single field in | bash/dash/yash/zsh/netbsd sh (though not in ksh..) is simply reality - the FreeBSD shell is in the ksh camp (they have a very POSIX conforming implementation, modulo the occasional bug of course). The effect is that if you want to write portable code, you simply cannot put quotes inside a quoted variable expansion using any of the older 4 operators (+ =- = ?) ... but with anything newer it works fine. And incidentally, after more research, I can no longer justify: | > The second because there's no real agreement whether it should produce | > 0 or 1 (different shells do different things for that one, and there's | > no particularly good argument for one or the other, so posix, I believe, | > makes that one unspecified as well.) There's nothing I can find which makes that happen, it appears that the standard still expects nothing (no fields) to result from "${X+$@}" when X is set, and there are no positional parameters, despite almost no shells (not even FreeBSD) implementing it that way. So that is another thing that will change - it will end up being unspecified as well (it is a fairly useless idiom to use, ${X+"$@"} is much more sane, and if you actually want a null string result, rather than nothing, simply append (or prepend) one, as ${X+"$@"}'' | Looks like when there are one or more positional parameters (and x | unset) all shells listed above expand "${x-$@}" to the proper number | of fields, Yes, when there are positional params, it is easy. It is whether or not the field should be deleted when there are none that is the harder case - shells can recognise "$@" and cause nothing to result if there are no params, but when the $@ is buried somewhere else it is much harder. | > but that's because they don't | > implement $_ (and good on them for that, stupid thing it is) | | Sorry, didn't mean to confuse the issue by using _, should've used a | more portable example. Not a problem. For future similar cases, you can use $0 instead of $_ - $0 is always set, so while writing ${0+whatever} in a real script would be lunacy, for running sh tests of something where the parameter is known to be set it works just fine, and avoids needing to use extra code to make sure the variable used is set. ($? and $$ would also work, but are more likely to confuse the reader, wondering just what magic you're trying to achieve!) In his reply, chet.ra...@case.edu said: | There is an interpretation that somewhat decoupled quotes outside the | expansion with quotes inside it, but I can't remember the specifics right | now. It might be 888, but I seem to remember another. It's not 888 (that's just $@), and if it was, it would be in the text now, and it isn't. But it has been done in something newer (decouple, as in "made unspecified") so now we're all free to implement sanity. One day perhaps even ksh will change again, and we might then be able to standardise sanity - but that will be decades away (most likely I won't live long enough to see that happen, I don't have that many decades left!) chet.ra...@case.edu also said: | Because ksh uses the open, open, close, close interpretation. Actually just the opposite, that's what sane implementations do (new quoting context after any operator in a ${var<op>word} expansion, not just any except the original 4), ksh (ksh93 anyway, mksh doesn't) is doing the old open close open close interpretation. So does the FreeBSD shell. kre