Re: XML::LibXML navigation

Jenda Krynicky Mon, 22 Jan 2007 08:34:03 -0800

From: "Beginner" <[EMAIL PROTECTED]>
> Hi,
> 
> I have to do some sanity checks on a large xml file of addresses (snip
> below). I have been using XML::LibXML and seem to have started ok but
> I am struggling to navigate around a record.
> 
> In the sample date below your'll see some addresses with "DO NOT..."
> in. I can locate them easily enough but I am struggling to navigate
> back up the DOM to access the code so I can record the code with
> faulty addresses.


A bit late and again using a different module:

use XML::Rules;

# find the tags and print <code>
my $parser_find = XML::Rules->new(
        rules => [
                _default => '',
                line => sub {$_[1]->{_content}."\n\t"},
                'code,lines' => 'content',
                address => sub {
                        if ($_[1]->{lines} =~ /\s+NOT\s+/) {
                                print $_[1]->{code}."\n";
                        }
                }
        ],
);
$parser_find->parse($xml);

# filter the <address> tags
my $parser_remove = XML::Rules->new(
        rules => [
                _default => 'raw',
                line => sub {
                        my ($tag, $attrs, $context, $parents) = @_;
                        if ($attrs->{_content} =~ /\s+NOT\s+/) {
                                $parents->[-2]{_remove} = 1; 
              # skip the <lines> and set the attribute 
              # directly in <address>
                        }
                        return [$tag => $attrs];
                },
                address => sub {
                        return $_[0] => $_[1] unless ($_[1]->{_remove});
                        return;
                }
        ],
        style => 'filter',
);

my $result;
open my $FH, '>', \$result;
$parser_remove->filter($xml, $FH);
close $FH;

print $result;

__END__

The plus is that this doesn't keep the whole XML in memory, but 
instead processes the bits as they are read&parsed, which may make a 
big difference with huge files.

Jenda

===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/

Re: XML::LibXML navigation

Reply via email to