clone 500015 -1 block 500015 by -1 tag 500015 -wontfix reassign -1 libxml2 retitle -1 Please support ASCII control chars in XML text content severity -1 wishlist thanks
On Wed, Sep 24, 2008 at 07:30:39PM -0700, Matt Kraai wrote: > On Wed, Sep 24, 2008 at 10:12:41AM -0700, Rodrigo Gallardo wrote: > > > The feed at > > > > > > http://jc.ngo.org.uk/~nik/use.perl.journals.rss > > > > > > currently contains a SOH character (i.e., the 0x01 character). When I > > > click on it in Liferea, it displays the following error message: > > > > > > XML Parsing Error: reference to invalid character number > > > Location: file:/// > > > Line Number 20, Column 45: > > > > > > <pre>Aha. On the line 580 of that we have a  character. Which leads > > > me to > > > --------------------------------------------^ > > > > > > The feed has a UTF-8 encoding declaration and the SOH character is a > > > valid Unicode character, so I think this error is in error. > > > > As a matter of fact, the XML spec says > > (http://www.w3.org/TR/REC-xml/#dt-character) > > that > > > > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > > [#x10000-#x10FFFF] > > > > so  is not a valid char for an XML document. > > I don't think this is a correct inference. In > http://www.w3.org/TR/REC-xml/#charsets, it says > > Consequently, XML processors MUST accept any character in the range > specified for Char. ] > ... > but it doesn't specify that it must accept *only* characters in that > range. It can be argued that it does, because it does not define them as a "Char", and it only accepts "Char"s in text content. In a less strict reading, it simply leaves unspecified whether or not to accept them, making this not a bug, but a feature request in any case. > In fact, the next paragraph states > > All XML processors MUST accept the UTF-8 and UTF-16 encodings of > Unicode 3.1 ... UTF-8 and UTF-16 are not definitions of character sets, but specifications of how to encode them as bytes. Thus, that paragraph is of no relevance. In any case, liferea does not do its own XML parsing, but defers to libxml2. I'm cloning this bug to them as a feature request, they're way more qualified than I to interpret the spec anyways. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]