On Wed, Aug 24, 2005 at 08:10:44AM +0300, era eriksson wrote: > Package: perl > Version: 5.8.7-4 > Tags: upstream > > When square or round brackets are used as regular expression delimiters, > the expression apparently cannot contain a backslash-escaped literal > opening delimiter bracket.
Summary: see perlop.pod, "Gory details of parsing quoted constructs". > I see nothing in the documentation to suggest that this is intentional > or expected behavior. > > vnix$ $ perl -ne 'print if m(\()' </dev/null > Unmatched ( in regex; marked by <-- HERE in m/( <-- HERE / at -e line > 1. >From perlop.pod, "Quote and Quote-like Operators": Non-bracketing delimiters use the same character fore and aft, but the four sorts of brackets (round, angle, square, curly) will all nest, which means that q{foo{bar}baz} is the same as 'foo{bar}baz' Further, in "Gory details of parsing quoted constructs": When searching for single-character delimiters, escaped delimiters and "\\" are skipped. For example, while searching for terminating "/", combinations of "\\" and "\/" are skipped. If the delimiters are bracketing, nested pairs are also skipped. For example, while searching for closing "]" paired with the opening "[", combinations of "\\", "\]", and "\[" are all skipped, and nested "[" and "]" are skipped as well. This implies that m(() would be invalid syntax, and you need to quote the opening bracket to get it through to the regexp at all if it doesn't have a matching pair. So m(\() means m/(/. This is an invalid regexp giving the error message above because '(' has special significance in regexps. Similarly > vnix$ perl -ne 'print if m[\[]' </dev/null > Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE / at -e line > 1. is the same as m/[/, which is an invalid regexp as well and the error message is correct. It would seem that getting the opening bracket through as a regexp literal similar to m/\(/ or m/\[/ would need a double escaping when the same bracket is the delimiter. However, further down in 'Gory details of parsing quoted constructs': The lack of processing of "\\" creates specific restrictions on the post-processed text. If the delimiter is "/", one cannot get the combination "\/" into the result of this step. "/" will finish the regular expression, "\/" will be stripped to "/" on the previous step, and "\\/" will be left as is. Because "/" is equivalent to "\/" inside a regular expression, this does not matter unless the delimiter happens to be character special to the RE engine, such as in "s*foo*bar*", "m[foo]", or "?foo?"; or an alphanumeric char [...] which is precisely the case here. This can be worked around with the normal regexp quote escape \Q...\E, so that m/\[/ becomes m[\Q\[\E]. The result can be easily confirmed with 'debugperl -Dr -e 'm[\Q\[\E]'. > Note that in the error message, the backslash is missing. > > The closing square bracket works as expected: > > vnix$ perl -ne 'print if m[\]]' </dev/null This corresponds to m/]/, which works because ']' isn't special in a regexp on its own, so it needs no quoting. > But with the rounded parens, the closing paren too is mishandled: > > vnix$ perl -ne 'print if m(\))' </dev/null > Unmatched ) in regex; marked by <-- HERE in m/) <-- HERE / at -e line > 1. Again, m/)/ is invalid. m(\Q\)\E) works. > With square brackets, you get an error message even if there is an > escaped pair: > > vnix$ perl -ne 'print if m[\[\]]' </dev/null > Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE ]/ at -e line > 1. This means m/[]/, which is an invalid regexp because the closing bracket is part of the list and doesn't end the character class. From perlre.pod: If you want either "-" or "]" itself to be a member of a class, put it at the start of the list (possibly after a "^"), or escape it with a backslash. The desired behaviour can be achieved with m[\Q[]\E] or m[\Q[\E\]]. The former works because of the nesting delimiter rule. > If you remove the escape from the closing square bracket, you still get > the error: > > vnix$ perl -ne 'print if m[\[]]' </dev/null > Unmatched [ in regex; marked by <-- HERE in m/[ <-- HERE / at -e line > 1. That's because it means m/[/] so the regexp part is just the opening bracket (and the closing bracket would cause a syntax error afterwards even if the regexp could be compiled). > Also note that the error message lacks the closing square bracket. > > With rounded brackets, matching pairs work as expected: > > vnix$ echo 'foo()' | perl -ne 'print if m(\(\))' > foo() That actually matches any string just like m/()/: the regexp is just an empty group. % echo 'foo' | perl -ne 'print if m(\(\))' foo For the sake of completeness, this needs m(\Q()\E) or m(\Q\(\)\E), and the former again works because of the nesting rule. > Curly brackets and brokets work fine: > > vnix$ perl -ne 'print if m{\{}' </dev/null > > vnix$ perl -ne 'print if m<\<>' </dev/null Those aren't special in regexps, so they should. Please let me know if I can close this bug. The documentation could always be better but I think things are working as documented. The 'gory details' title pretty much describes all the variations here :) Cheers, -- Niko Tyni nt...@debian.org -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org