From: Kevin Viel <[EMAIL PROTECTED]>
> Jenda Krynicky kindly provided:
>
> > use XML::Rules;
> >
> > my $parser = XML::Rules->new(
> > rules => [
> > Id => 'content',
> > Item => sub {$_[1]->{Name} => $_[1]->{_content}},
> > # from the <Item> tags we are interested in the content
> > # and want to use the Name attribute as the key to access
> > # that value. We ignore the Type attribute.
> > DocSum => sub {
> > # by now all the data from the <Item>s are in the %{$_[1]} hash
> >
> > if ($_[1]->{Chromosome} != 8
> > or $_[1]->{NomenclatureName} !~ /\bviral\b/) {
> > # ignore everything outside the 8th chromosome that's not
> > # 'viral'
> > return;
> > }
> >
> > # do something with the data
> > # or return the part of the data you want to keep using whatever
> > # you suits you best as the key
> > return $_[1]->{Name} => $_[1];
> > },
> > eSummaryResult => 'pass no content',
> > ]
> > );
> >
> > my $data = $parser->parse($the_xml_or_file);
> >
> > print $data->{MYC}{NomenclatureName}, "\n";
> > __END__
>
> I'd like to understand this better. It seems to be a reference
> (little arrow). Is that the same as using /@referenced_array, for
> instance?
Assuming you use the code above as is you end up with a reference to
a HoH in $data. The first level of keys will be the Names of the
genes (or whatever's the content of the <DocSum> tags), the second
level will be the values of the Name attributes from the <Item> tags.
You may want to run the script on a short XML and print the returned
data structure by
use Data::Dumper;
print Dumper($data);
> It seems to be a hash with the key "rules" and a four-item array as
> its value. The third item of this array is a hash with a subroutine,
> or anonymous function declaration, as its value.
The constructor of the XML::Rules object accepts several named
arguments, the most important being "rules". it's either a reference
to an array or hash containig the "rules" to apply to the tags read
from the XML. Whenever a tag is fully parsed (including the </closing
tag>!) the module calls the specified subroutine (or builtin) to
massage/filter/process the data from the tag. Whatever the subroutine
returns is then made available to the subroutine specified for the
parent tag.
> I am wrong, correct?
>
> A) Correct, you were incorrect.
> B) Incorrect, you were correct.
> C) You're still buying beer.
>
> To start with specific questions, could someone explain:
>
> > Item => sub {$_[1]->{Name} => $_[1]->{_content}}
In this particular case whenever the <Item ....>...</Item> is fully
parsed this subroutine is called. It ignores the Type attribute and
returns just the value of the Name attribute and the tag content in
such a way that the first becomes a key and the later the value in
the attribute hash of the parent tag, in this case <DocSum>.
Later on, once the </DocSum> closing tag is parser all the values
from all the <Item> tags within that <DocSum> will be available in
the subroutine specified for the <DocSum> tag in the hash referenced
by $_[1] like this:
$_[1]->{Name} # the value will be "MYC"
$_[1]->{Description} # = "v-myc myelocytomatosis viral oncogene
homolog (avian)"
etc.
HTH, Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/