Package: libxml-parser-perl
Version: 2.34-4.2
Severity: normal

XML::Parser is not robust enough to handle all the broken rss feeds out
there. The most common breakage that it fails on is a feed that contains
an invalid utf-8 sequence:

not well-formed (invalid token) at line 86, column 165, byte 4698 at
/usr/lib/perl5/XML/Parser.pm line 187

I've attached a copy of this feed.

The approach taken in other languages XML parsers, such as python's
feedparser, is to attempt to be as robust as possible, to be forgiving in
what is accepted. They also set a bozo bit if a feed is not well-formed,
so that tools that care can detect this.

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)

Kernel: Linux 2.6.20-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libxml-parser-perl depends on:
ii  libc6                         2.5-2      GNU C Library: Shared libraries
ii  libexpat1                     1.95.8-3.4 XML parsing C library - runtime li
ii  liburi-perl                   1.35-2     Manipulates and accesses URI strin
ii  libwww-perl                   5.805-1    WWW client/server library for Perl
ii  perl                          5.8.8-7    Larry Wall's Practical Extraction 
ii  perl-base [perlapi-5.8.8]     5.8.8-7    The Pathologically Eclectic Rubbis

libxml-parser-perl recommends no packages.

-- no debconf information

-- 
see shy jo

Attachment: signature.asc
Description: Digital signature

Reply via email to