From: "Beginner" <[EMAIL PROTECTED]>
> Hi,
>
> I have to do some sanity checks on a large xml file of addresses (snip
> below). I have been using XML::LibXML and seem to have started ok but
> I am struggling to navigate around a record.
>
> In the sample date below your'll see some addresses with "DO NOT..."
> in. I can locate them easily enough but I am struggling to navigate
> back up the DOM to access the code so I can record the code with
> faulty addresses.
A bit late and again using a different module:
use XML::Rules;
# find the tags and print <code>
my $parser_find = XML::Rules->new(
rules => [
_default => '',
line => sub {$_[1]->{_content}."\n\t"},
'code,lines' => 'content',
address => sub {
if ($_[1]->{lines} =~ /\s+NOT\s+/) {
print $_[1]->{code}."\n";
}
}
],
);
$parser_find->parse($xml);
# filter the <address> tags
my $parser_remove = XML::Rules->new(
rules => [
_default => 'raw',
line => sub {
my ($tag, $attrs, $context, $parents) = @_;
if ($attrs->{_content} =~ /\s+NOT\s+/) {
$parents->[-2]{_remove} = 1;
# skip the <lines> and set the attribute
# directly in <address>
}
return [$tag => $attrs];
},
address => sub {
return $_[0] => $_[1] unless ($_[1]->{_remove});
return;
}
],
style => 'filter',
);
my $result;
open my $FH, '>', \$result;
$parser_remove->filter($xml, $FH);
close $FH;
print $result;
__END__
The plus is that this doesn't keep the whole XML in memory, but
instead processes the bits as they are read&parsed, which may make a
big difference with huge files.
Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/