Wednesday, June 28, 2006, 9:55:29 PM, James Snell wrote:
> David, > you're right, ideally the xhtml container div would be nothing but the > div, but if it's not, we still need to be prepared to handle it. Silent > data loss sucks, if it's silly data :-) I'm just looking at it from the perspective of the producer and the consumer. In my consumer implementation, I take the resolved base URI of the div (including any xml:base there), and the language context of the div, discard the div, and store them both out-of-band of the content, with namespace prefixes inline. That's probably good enough. Some post-processing is used to convert the data in the store into a form that allows it to be safely embedded in an HTML page - I've been trying XSLT (with TagSoup for HTML content). I don't think that the div should have lang or base attached, but if it is there, it is better to use it than ignore it, cause it is likely there for a reason. I wouldn't produce feeds like that though. If people start using CSS links in feeds (or even just CSS styling in aggregators), discarding the div could be important. If you're going to supply an API for extracting usable [X]HTML, there are a number of features that consumers might want in some combination: * Forcing the XHTML to use a blank namespace prefix to make it DTD compatable, and removing unused prefixes. * Resolving relative references (which will inevitably be a lossy process) * Removing XSS risks (intentionally lossy) I still keep the original content in a reasonably accurate form though. -- Dave
