op wrote:
Hello,
Hello,
I thought I had understood regular expression grouping relatively
well, untill I ran into the following behavior:
perl -e '
my $string = "<div name=\"abcd\" id=\"dontcare\" class=\"testvalue\"
lang=\"hindi\">";
while ($string =~ /(lang)|(id)="(\S+)"/g) {
print "\$1->|$1|, \$2->|$2|, \$3->|$3|\n";
}
'
outputs:
$1->||, $2->|id|, $3->|dontcare|
$1->|lang|, $2->||, $3->||
Soon after I realised my mistake and replaced
(lang)|(id) with (lang|id),
getting the ouput I expected:
$1->|id|, $2->|dontcare|, $3->||
$1->|lang|, $2->|hindi|, $3->||
However, I still would have expected the first version to capture the
same strings, even with the superfluous parentheses. So, I obviously
haven't understood as much of regexp capturing as I had hoped, maybe
someone could enlighten me on this? I did browse through the parts on
capturing/grouping in perlre, perlreref and perlretut, but didn't find
anything that would have made me understand this.
Your first regular expression /(lang)|(id)="(\S+)"/ says to match either
the pattern /lang/ or the pattern /id="\S+"/. The second one
/(lang|id)="(\S+)"/ says to match either the pattern /lang="\S+"/ or the
pattern /id="\S+"/. The alternation affects the whole pattern unless it
is inside parentheses.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/