On Fri, Jun 4, 2010 at 12:23, Roman Makurin <[email protected]> wrote:
> Hi, here it is http://pastebin.org/307289
>
> On Fri, Jun 04, 2010 at 12:06:24PM -0400, Chas. Owens wrote:
>> On Fri, Jun 4, 2010 at 10:16, Roman Makurin <[email protected]> wrote:
>> > Hi all
>> >
>> > Last time i have a big problem, i need parse xml files
>> > which have invalid xml chars outside of CDATA and xml
>> > parser hangs everytime on such files. Is there any way
>> > to parse such files ???
>> snip
>>
>> Can you give an example of these invalid characters?
>>
>> --
>> Chas. Owens
>> wonkden.net
>> The most important skill a programmer can have is the ability to read.
>
> --
> If you think of MS-DOS as mono, and Windows as stereo,
> then Linux is Dolby Digital and all the music is free...
>
Given that this is RSS, you should be able to get away with using a
regex to fix the links. This works for me:
#!/usr/bin/perl
use strict;
use warnings;
use XML::RSS::Parser;
use URI::Escape qw/uri_escape uri_unescape/;
my $filename = shift;
my $xml = do {
open my $fh, "<", $filename
or die "could not open $filename: $!";
local $/;
<$fh>;
};
$xml =~ s{<link>(.*?)</link>}{"<link>" . uri_escape($1) . "</link>"}seg;
my $p = XML::RSS::Parser->new
or die "could not create parser\n";
my $feed = $p->parse_string($xml)
or die "could not parse $filename:", $p->errstr, "\n";
for my $item ( $feed->query('//item') ) {
my $title = $item->query('title')->text_content;
my $link = uri_unescape $item->query('link')->text_content;
printf "%60.60s: %s\n", $title, $link;
}
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/