On Fri, Jun 29, 2007 at 08:59:37PM +0200, Lars Lindner wrote:
> You are correct this is an error message given by libxml2.
> 
> But you are totally wrong about handling invalid XML. The core
> idea of XML is to guarantee applications a correct content encoding
> by ensuring well-formedness and validity of the given data.
> 
> So suggesting to have weak XML parsing invalidates the idea of XML itself.
> Also what should a parser do with a file that contains for example partly
> UTF-8 content and partly Latin-1? There is no way to decide what to do
> with the byte mess.

You'll note that I carefully did not suggest Liferea should be
tolerant of the messed up UTF-8; I was just complaining about it.  I
fixed that elsewhere by judicious use of iconv and outside knowledge.

An unbound prefix is a very different sort of error from invalid
UTF-8.

> With XML the rule is applications should *ALWAYS* refuse non-wellformed
> content. Also when using a library for parsing the application has no
> way to force tolerant parsing. As for libxml2 I know for sure that the
> author clearly disagrees with applications wanting to do tolerant parsing.

In any case, previous versions of liferea were able to display these
common entries without trouble.  I don't know if that means it did not
push article bodies through libxml2; I think it somewhat likely, since
this is the escaped contents of the <description>, not part of the RSS
feed proper.  Normally that's HTML, with all the attendant sloppiness,
rather than well-formed XML.

-- 
Daniel Jacobowitz
CodeSourcery


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to