On Mar 23, 2006, at 9:48 AM, James Holderness wrote:
Hahaha! It's RSS all over again. In the words of Mark Pilgrim:
"Here's something that might be HTML. Or maybe not. I can't tell
you, and you can't guess." :-)
Seriously though, the atom:name element is described as "a human-
readable name", so unless your name really is "Betrand
Caf&eacture;" that can't be right. If RFC4287 had intended to allow
markup in the element it would have used atomTextConstruct.
I agree with James here--if we had intended for the name to be able
to include markup, we should have used the construct we created to
allow that. This from RFC 4287 (section 3.2):
element atom:name { text }
would have been this:
element atom:name { atomTextConstruct }
if we had intended for it to be able to contain anything but literal
text after XML un-escaping, right?
On Mar 23, 2006, at 9:57 AM, Eric Scheid wrote:
It's true that XML has only a half dozen or so entities defined,
meaning
most interesting entities from html can't exist in XML ... unless
maybe they
are wrapped like in CDATA block like above?
If they're wrapped in a CDATA block, then they don't trigger an XML
parsing error, but wrapping something in CDATA isn't a license to
enter data in a format other than what the RFC allows.
I'm getting the data by scraping an html page, so I'm expecting it
to be
acceptable html code, including html entities.
You, the producer, are getting the data from an HTML page, so you
should certainly be prepared to handle HTML entities in it. But you
the Atom publisher are responsible for making sure that you've made
any changes to the data that are necessary for it to be proper Atom
before you publish it. The consumer of the Atom feed doesn't know
where you got the data, and thus can't be expected to decide how to
process it based on where you got it.