Re: atom:name ... text or html?

Antone Roundy Thu, 23 Mar 2006 09:59:41 -0800


On Mar 23, 2006, at 9:48 AM, James Holderness wrote:

Hahaha! It's RSS all over again. In the words of Mark Pilgrim:"Here's something that might be HTML. Or maybe not. I can't tellyou, and you can't guess." :-)
Seriously though, the atom:name element is described as "a human-readable name", so unless your name really is "BetrandCaf&eacture;" that can't be right. If RFC4287 had intended to allowmarkup in the element it would have used atomTextConstruct.

I agree with James here--if we had intended for the name to be ableto include markup, we should have used the construct we created toallow that. This from RFC 4287 (section 3.2):


   element atom:name { text }

would have been this:

   element atom:name { atomTextConstruct }

if we had intended for it to be able to contain anything but literaltext after XML un-escaping, right?


On Mar 23, 2006, at 9:57 AM, Eric Scheid wrote:

It's true that XML has only a half dozen or so entities defined,meaningmost interesting entities from html can't exist in XML ... unlessmaybe they
are wrapped like in CDATA block like above?

If they're wrapped in a CDATA block, then they don't trigger an XMLparsing error, but wrapping something in CDATA isn't a license toenter data in a format other than what the RFC allows.

I'm getting the data by scraping an html page, so I'm expecting itto be
acceptable html code, including html entities.

You, the producer, are getting the data from an HTML page, so youshould certainly be prepared to handle HTML entities in it. But youthe Atom publisher are responsible for making sure that you've madeany changes to the data that are necessary for it to be proper Atombefore you publish it. The consumer of the Atom feed doesn't knowwhere you got the data, and thus can't be expected to decide how toprocess it based on where you got it.

Re: atom:name ... text or html?

Reply via email to