Le Mer 26 Janvier 2005 03:47, Dirk Eddelbuettel a écrit :
> In response to the mail by Pierre Habouzit dated 25 January 2005 at 
14:10:
> | Le Mardi 25 Janvier 2005 13:20, Dirk Eddelbuettel a écrit :
> | > Package: akregator
> | > Version: 1.0-beta8-2
> | > Severity: important
> | >
> | > Salut Pierre,
> | >
> | > Thanks for looking akregator -- a great rss reader.  Once in a
> | > while, and in particular on Debian Planet, someone puts a '&'
> | > into a story heading. Currently this can be seen on
> | > planet.debian.org in the story from Marga.
> | >
> | > Akregator then stops updating the feed. This is rather annoying. 
> | > I discussed this with the authors of the planet code, and their
> | > take is that their Python toolset, in particular the rss part, is
> | > robust -- it is the readers that are at fault.
> |
> |   they are wrong.
> |
> |   I don't say that akregator should'nt be more robust to bad
> | encoded entities, but the spec for RSS is xml, and in xml you have
> | to put things like that :
> |
> |  <tag>put an ampersand : &amp;</tag>
> |
> | OR
> |
> |  <tag><![CDATA[put an ampersand : &]]></tag>
> |
> | any other solution is NOT correct, so you can bug debianplanet too.
> | or the python module they use.
>
> Can't resist CCing Scott here as I pestered him (unsuccessfully)
> about this before.

and if he is not convinced, ask him to open debian planet in 
mozilla/firefox or even the last konqueror. the verdict is 
unanimously : « xml error »

> |   the bug is well known ([1] : funny, it is a debian planet problem
> | too), and is Qt's fault, since akregator uses the Qt xml API that
> | has a very strict (but also correct) parser.
>
> Well, is someone going to fix it?

the problem is : we cannot act on the Qt xml parser easily. and make 
some preg_replace(/&\b/, /&amp;/, feed) will not work, because of the 
<![CDATA[  ]]>. that's why nothing has been done. the only solution I 
can see that is correct would be either to make a parser that only 
corrects the feed, but that's an overkill, and will moreover cost too 
many time. Or use another XML parser/API. but that's a big big work, 
since you imagine that the XML API has a major place in akregator :)

I guess we (i mean the akregator coder team) should look at what 
konqueror uses ... I guess there is KXML classes. but it would mean a 
major rewrite

> | anyway, the bug is certainly not important, at most normal, since
> | it only stop the fetch of the faulty feed, and not of the others.
>
> Don't disagree completely but reading Planet Debian is important to
> me, and I can't read it right now. As our lusers break the content,
> how about if both ends of the software strive to get better here?

I like to read DPlanet too. but ...

btw, the other * Planet I read (KDE Planet e.g.) does not suffer from 
the same problems.

And there is a computer sentence that say :

« you must be very strict wrt what you produce, and very tolerant wrt 
what you read » and I like to add « but nobody can blame you not to be 
tolerant enough when the input is faulty »

that's mean that the first sentence is a quite good practice. but it's 
not a philosophy that you HAVE to match.

and here, I'm sorry to confirm another time, DP *is* faulty, it's 
pervasive XML knowlege : in xml you have two choices wrt < & and > : 
escape them &amp; &lt; &gt; OR put them unescaped in <![CDATA[ ]]> 
sections. DPlanet has been delivering bad feeds too many times, so I 
think they should really improve their RSS output module.

-- 
·O·  Pierre Habouzit
··O
OOO                                                http://www.madism.org

Attachment: pgpMLMfCdMyuv.pgp
Description: PGP signature

Reply via email to