On Fri, Jun 29, 2007 at 08:59:37PM +0200, Lars Lindner wrote: > You are correct this is an error message given by libxml2. > > But you are totally wrong about handling invalid XML. The core > idea of XML is to guarantee applications a correct content encoding > by ensuring well-formedness and validity of the given data. > > So suggesting to have weak XML parsing invalidates the idea of XML itself. > Also what should a parser do with a file that contains for example partly > UTF-8 content and partly Latin-1? There is no way to decide what to do > with the byte mess.
You'll note that I carefully did not suggest Liferea should be tolerant of the messed up UTF-8; I was just complaining about it. I fixed that elsewhere by judicious use of iconv and outside knowledge. An unbound prefix is a very different sort of error from invalid UTF-8. > With XML the rule is applications should *ALWAYS* refuse non-wellformed > content. Also when using a library for parsing the application has no > way to force tolerant parsing. As for libxml2 I know for sure that the > author clearly disagrees with applications wanting to do tolerant parsing. In any case, previous versions of liferea were able to display these common entries without trouble. I don't know if that means it did not push article bodies through libxml2; I think it somewhat likely, since this is the escaped contents of the <description>, not part of the RSS feed proper. Normally that's HTML, with all the attendant sloppiness, rather than well-formed XML. -- Daniel Jacobowitz CodeSourcery -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]