XhtmlContentDivConformanceTests

David Powell Wed, 28 Jun 2006 14:58:02 -0700


Wednesday, June 28, 2006, 9:55:29 PM, James Snell wrote:


> David,

> you're right, ideally the xhtml container div would be nothing but the
> div, but if it's not, we still need to be prepared to handle it.  Silent
> data loss sucks, if it's silly data :-)

I'm just looking at it from the perspective of the producer and the
consumer.

In my consumer implementation, I take the resolved base URI of the div
(including any xml:base there), and the language context of the div,
discard the div, and store them both out-of-band of the content, with
namespace prefixes inline. That's probably good enough. Some
post-processing is used to convert the data in the store into a form
that allows it to be safely embedded in an HTML page - I've been
trying XSLT (with TagSoup for HTML content).

I don't think that the div should have lang or base attached, but if
it is there, it is better to use it than ignore it, cause it is likely
there for a reason. I wouldn't produce feeds like that though.

If people start using CSS links in feeds (or even just CSS styling in
aggregators), discarding the div could be important.

If you're going to supply an API for extracting usable
[X]HTML, there are a number of features that consumers might want in
some combination:

* Forcing the XHTML to use a blank namespace prefix to make it DTD
  compatable, and removing unused prefixes.

* Resolving relative references (which will inevitably be a lossy
  process)

* Removing XSS risks (intentionally lossy)

I still keep the original content in a reasonably accurate form
though.

-- 
Dave

Re: http://www.intertwingly.net/wiki/pie/XhtmlContentDivConformanceTests

Reply via email to