From: "Mike Blezien" <[EMAIL PROTECTED]>
> we need to parse some very large XML files, approx., 900-1000KB's filesize. A
> sample of a typical XML file can be view here that would be parsed:
> http://projects.thunder-rain.com/uploads/000001.xml
I'm probably comming late, but the anyway ... this looks like a
perfect task for my XML::Rules. The URL doesn't work anymore so I'm
guessing the structure of the XML.
Using XML::Rules the code would look somewhat like this:
#!perl
use XML::Rules;
my $parser = XML::Rules->new(
rules => [
_default => 'content',
tracks => 'pass no content',
'track,sound' => 'no content array',
product => sub {
my ($tag, $attr) = @_;
delete $attr->{_content};
#use Data::Dumper;
#print Dumper($attr);
print <<"*END*";
article_number: $attr->{'article_number'}
distributor_number: $attr->{'distributor_number'}
distributor_name: $attr->{'distributor_name'}
artist: $attr->{'artist'}
ean_upc: $attr->{'ean_upc'}
set_total: $attr->{'set_total'}
*END*
foreach my $track (@{$attr->{track}}) {
print " Track: $track->{trackno}. $track->{title}
($track-
>{setno})\n";
foreach my $sound (@{$track->{sound}}) {
print " Sound: $sound->{file}\n Type:
$sound->{sound_type}
(Codec: $sound->{codec})\n";
}
}
print "\n";
return;
}
]
);
$parser->parse(\*DATA);
__DATA__
<products>
<product>
<article_number>Blah blah</article_number>
<distributor_number>Blah blah</distributor_number>
<distributor_name>Blah blah</distributor_name>
<artist>Blah blah</artist>
<ean_upc>Blah blah</ean_upc>
<set_total>Blah blah</set_total>
<tracks>
<number_of_tracks>2</number_of_tracks>
<track>
<title>Blah blah</title>
<trackno>1</trackno>
<setno>Blah blah</setno>
<sound>
<sound_type>Blah blah</sound_type>
<codec>Blah blah</codec>
<file>Blah blah</file>
</sound>
</track>
<track>
<title>YDFbibusdf</title>
<trackno>2</trackno>
<setno>Blah blah</setno>
<sound>
<sound_type>Blah blah</sound_type>
<codec>Blah blah</codec>
<file>Blah blah</file>
</sound>
</track>
</tracks>
</product>
</products>
__END__
I believe this will be even more efficient than XML::Twig.
http://xmltwig.com/article/ways_to_rome/ways_to_rome.html#todo
HTH, Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/