On Mon, 2008-08-04 at 03:28 -0700, op wrote:
> Hello,
> I thought I had understood regular expression grouping relatively
> well, untill I ran into the following behavior:
>
> perl -e '
> my $string = "<div name=\"abcd\" id=\"dontcare\" class=\"testvalue\"
> lang=\"hindi\">";
> while ($string =~ /(lang)|(id)="(\S+)"/g) {
> print "\$1->|$1|, \$2->|$2|, \$3->|$3|\n";
> }
> '
> outputs:
> $1->||, $2->|id|, $3->|dontcare|
> $1->|lang|, $2->||, $3->||
>
> Soon after I realised my mistake and replaced
> (lang)|(id) with (lang|id),
> getting the ouput I expected:
> $1->|id|, $2->|dontcare|, $3->||
> $1->|lang|, $2->|hindi|, $3->||
>
> However, I still would have expected the first version to capture the
> same strings, even with the superfluous parentheses. So, I obviously
> haven't understood as much of regexp capturing as I had hoped, maybe
> someone could enlighten me on this? I did browse through the parts on
> capturing/grouping in perlre, perlreref and perlretut, but didn't find
> anything that would have made me understand this.
>
> Thanks in advance,
> -op
Perl captures on a strict left-to-right policy. Start at the left of
the regular expression and every opening parenthesis will be captured in
the next numbered variable. For your first example:
/(lang)|(id)="(\S+)"/g
^ ^ ^
$1 $2 $3
The 'g' flag at the end makes it repeatedly apply the match. Since 'id'
appears in $string before 'lang', the first match is: $1 is undef, $2 =
'id', $3 = 'dontcare'. The second match gives: $1 = 'lang', $2 = undef,
$3 = undef. Note that because of its level of precedence, the
alteration applies to the whole pattern so $2 and $3 are undef when
'lang' is matched.
When you changed the pattern to:
/(lang|id)="(\S+)"/g
^ ^
$1 $2
Only two of the number variables will be filled regardless of what
matches. And again, the output is in the order of appearance in
$string.
--
Just my 0.00000002 million dollars worth,
Shawn
"Where there's duct tape, there's hope."
"Perl is the duct tape of the Internet."
Hassan Schroeder, Sun's first webmaster
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/