Date: Sat, 9 Nov 2019 07:35:16 +0300 From: =?UTF-8?B?T8SfdXo=?= <oguzismailuy...@gmail.com> Message-ID: <cah7i3lr68civxlr9_hoogqa7vd-zyvz+fck-0k3uqptnsir...@mail.gmail.com>
| is correct, as "foo" does not contain a ']' which would be required | > to match there (quoting the ':' means there is no character class, | > hence we have instead (the negation of) a char class containing '[' ':' | > 'l' 'o' 'w' 'e' ';r' (and ':' again), preceded by anything, and | > followed by ']' and anything. foo does not match. f]oo would. | > | | where exactly is this documented in the standard? I'm not sure which part exactly you're looking for, but char sets in sh are specified to be the same as in REs, except that ! replaces ^ as the negation character (that's in XCU 2.13.1). Char sets (bracket expressions) in RE's are documented in XBD 9.3.5 wherein it states A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions. The <right-square-bracket> (']') shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial <circumflex> ('^'), if any). Otherwise, it shall terminate the bracket expression, That is, a ']' that occurs anywhere else terminates the bracket expression except: unless it appears in a collating symbol (such as "[.].]") (not relevant in the given example) or is the ending <right-square-bracket> for a collating symbol, equivalence class, or character class. So the ']' that immediately follows the second ':' would not terminate the bracket expression if it is the ending ']' for a character class (collating symbols and equiv classes not being relevant to the example). Of course, that can only happen if there is a character class to end. There's also The special characters '.', '*', '[', and '\\' (<period>, <asterisk>, <left-square-bracket>, and <backslash>, respectively) shall lose their special meaning within a bracket expression. whereupon if the [": sequence does not start a char class, the '[' there is simply a literal char inside the bracket expression. Similarly if the bracket expression ends at the first ']' (the one imediately after the second ':') the following ']' is simply a literal character, as ']' chars are special only when following a '['. So, all that's left to determine is whether the [": sequence can be considered as beginning a char class. In a RE it certainly cannot - quote chars (' and ") are not special in REs at all, and [": is no different syntatically than [x: which no-one would treat as being the introduction to a char class. This is also, I believe (Chet can confirm, or refute, if he desires) where bash gets the interpretation that "lower" (including the quotes) is the name of the char class in [:"lower":] except that it cannot be, as char class names cannot contain quote characters (which should lead to the whole sub-expression not being treated as a char class at all, instead bash treats it, I think, as if it were an unknown but valid class name). But when it comes from sh, quote chars are "different" and instead of just being characters, they instead affect the interpretation of the characters that are quoted. See XCU 2.2: Quoting is used to remove the special meaning of certain characters or words to the shell. Quoting can be used to preserve the literal meaning of the special characters in the next paragrapyh [...] and the following may need to be quoted under certain circumstances. That is, these characters may be special depending on conditions described elsewhere in this volume of POSIX.1-2017: * ? [ # ~ = % to which more chars have been added (as I recall) recently by some Austin Group correction (which I think includes ! : - and ]), that is to make it clear, that in sh [a'-'z] is a bracket expression containing 3 chars 'a' '-' and 'z' (which form of quoting is used to remove the specialness of the '-' is irrelevant). and that "[a-z]" isn't a bracket expression at all (neither of which is true in an RE - though the role of \ in RE's is being altered slightlty so if it had been [a\-z] in a RE things are less clear.) The effect of this is that in sh, in an expression like [![":lower":]] the first ':' is not "special" and hence cannot form part of the magic opening '[:' sequence for a character class. Hence this expression contains no character class, and consequently the ':]' chars are simply a ':' in the bracket expression, and then the terminating ']' - which leaves the second ']' being just a literal character. While here (these following parts are not relevant to your question I believe) when used in sh [[:"lower":]] should be treated just the same as [[:lower:]] for the same reason that ["abc"] is treated the same as [abc] That is, quoted characters that are not special are no different than the same character unquoted. That's universal in sh, quoting removes special meaning (of lots of things) but where there was none the quoting changes nothing at all, eg: "ls" \-'l' is exactly the same as ls -l and x="foo" y='' is identical to x=foo y= (though not all empty quoted strings are irrelevant that way). There are other issues that are less clear what should happen, if your example had been [![:"lower:"]] then we get into very murky water indeed. XBD 9.3.5 says: The character sequences "[.", "[=", and "[:" (<left-square-bracket> followed by a <period>, <equals-sign>, or <colon>) shall be special inside a bracket expression [aside: not related to my current point, the "shall be special" is what enables sh quoting to stop that from happening, since quoting in the shell prevents specialness from happening] and are used to delimit collating symbols, equivalence class expressions, and character class expressions. That part (so far) is clear and non-controversial. These symbols shall be followed by a valid expression and the matching terminating sequence ".]", "=]", or ":]", as described in the following items. That's the part that is less clear. When a valid expression and the terminating sequence appear, there is no issue, and all is fine - what is less clear is what happens when one of those reqirements is not met. Some read this as purely a reqirement on the application - what the script writer is required to do - and when they don't the implementation (sh or RE library, or whatever) is free to interpret things (which means the whole pattern) however it likes (often as not being a pattern at all). Personally I disagree - I believe it is a requirement on the application if it desires the relevant sequence to be interpreted as a char class (etc) and if the application does not include a valid expression or terminating sequence the implementation should be required to treat the opening char sequence as if it did not begin a char class (etc) and the [: were simply 2 chars contained in the bracket expression (they must be in a bracket expression or the issue doesn't arise at all). Unfortunately (for the world in general, in that more and more of this is becoming unspecified, which makes it harder and harder to know what any particular sequence of characters will do) it seems like the former interpretation is the more likely to be adopted. If I have not understoood the "this" in your where exactly is this documented please be more precise, and I will try to answer. kre