Re: atom:name ... text or html?

David Powell Thu, 23 Mar 2006 09:38:32 -0800


Thursday, March 23, 2006, 4:57:11 PM, you wrote:

> On 24/3/06 3:21 AM, "Anne van Kesteren" <[EMAIL PROTECTED]> wrote:

>>> <author><name><![CDATA[Bertrand Caf&eacute;]]></name></author>
>>> 
>> Even if it was "HTML" you couldn't really use the entity, could you? I think
>> you have to use a character reference or the actual character instead, yes.
>> 

> It's true that XML has only a half dozen or so entities defined, meaning
> most interesting entities from html can't exist in XML ... unless maybe they
> are wrapped like in CDATA block like above?

atom:name is not intended to contain HTML, the spec for it doesn't
mention HTML, it is no more correct to put HTML in it, than it is to
put base64'd PDF in there.

> I'm getting the data by scraping an html page, so I'm expecting it to be
> acceptable html code, including html entities.

Your HTML parser should decode the entities for you and return a
string. Your Atom generator should encode or escape the string using
numeric entities.

If you really need to use HTML entities directly, then you could put:

<!DOCTYPE feed [
<!ENTITY eacute "&#233;">
]>

at the top of your feed and get rid of that CDATA. XML processors are
REQUIRED [1] to process internal DTD subsets.

[Hmm, internal DTD subsets completely fail in IE7's feed reader,
throwing up a "friendly error message"]

[1] <http://www.w3.org/TR/2004/REC-xml-20040204/#proc-types>

-- 
Dave

Re: atom:name ... text or html?

Reply via email to