[issue1390] toxml generates output that is not well formed
Changes by Thomas Conway: -- components: Library (Lib) nosy: drtomc severity: normal status: open title: toxml generates output that is not well formed type: behavior versions: Python 2.5 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
New submission from Thomas Conway:
The attached script yields a non-well-formed xml document.
Added file: http://bugs.python.org/file8692/bug.py
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1390>
__from xml.dom.minidom import parseString
d = parseString("wibble")
d.documentElement.appendChild(d.createComment("-->"))
print d.toxml()
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment: Either it is a bug in the DOM implementation, which should reject comments containing -->, a bug in toxml() which should refuse to serialize unserializable documents, or it is a bug in the documentation. cheers, Tom __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment:
Hi Martin,
You write:
It's not a bug in toxml, which should always serialize the
DOM tree if possible.
Right! In this case it is *not* possible. The generated serialization is
not a well formed XML document.
Having just consulted the DOM technical reports, I see that
createComment is specified as not generating any exceptions, so although
it would be quite a sensible place to check that one was not creating an
insane document, probably one should not do the check there. I think
you're right that this *is* a bug in DOM, and I will report it there.
Having said that, I still think that toxml should throw an exception. In
general, if toxml succeeds, I would expect to be able to parse the result.
I can propose a doco change, but I think such would only be a partial
solution. Something like the following addition to the description for
createComment
Note that comments containing the string C{-->} may make the document
unserializable.
cheers,
Tom
__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1390>
__
___
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment: Hi Martin, toxml() is not part of the DOM, so it could be changed to throw an exception. However, I suggest doing nothing, for the moment - I've posted to the dom mailing list at w3, so I'll see what wisdom we get from its members. cheers, Tom __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment: The W3 guys had some information that helps. The DOM3 Core specification contains the following No lexical check is done on the content of a comment and it is therefore possible to have the character sequence "--" (double-hyphen) in the content, which is illegal in a comment per section 2.5 of [XML 1.0]. The presence of this character sequence must generate a fatal error during serialization. This suggest that toxml is does not comply with DOM3 at any rate. cheers, Tom __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment: FWIW, the DOM guys considered mandating a check in createComment, but decided that the performance penalty was too great. I'm not sure I agree with them, but there you have it. Here are links to my query about the issue: http://lists.w3.org/Archives/Public/www-dom/2007OctDec/0017.html http://lists.w3.org/Archives/Public/www-dom/2007OctDec/0018.html __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment: I think the specification is reasonably clear: createComment may not throw an exception. The serializer must throw an exception. (Personally, I think they have it round the wrong way - every time you write a serializer you have to write code to do the check; if it was in createComment, you'd only have to do it once. Never mind!) The problem of compatibility is, as always, a nasty one: whether or not to potentially break code that previously worked. In this case, I think modifying toxml (and the other serializing functions in the same library) to throw an exception is pretty unlikely to break existing code. The *only* way to trigger the error is if you call createComment with bad text. Moreover, the programs which "succeeded" before which now fail were almost certainly producing wrong output before, which if it did not break downstream processing, would at least produce strange bits of extra character data. If the library is changed to throw an exception, at least it will alert the author/maintainer to the problem. I would estimate the expected number of programs to be broken by such a change to be about 0. :-) This is certainly not the first time in the history of software development the break or not to break issue has come up. Is there a precedent in the python libraries for how to deal with this kind of issue? Can we add a quickAndBuggy = True default parameter to toxml, then in a couple of releases make it mandatory, then in a couple of further releases remove it and the old behaviour? cheers, Tom __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway added the comment: On Feb 13, 2008 6:27 AM, Virgil Dupras <[EMAIL PROTECTED]> wrote: > CDATASection.writexml() already raises ValueError when finding invalid data, > so it seems consistent to me to extend the behavior to Comment.writexml() That looks fine to me. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1390] toxml generates output that is not well formed
Thomas Conway <[EMAIL PROTECTED]> added the comment: On Thu, Mar 20, 2008 at 8:26 AM, Sean Reifschneider <[EMAIL PROTECTED]> wrote: > > Sean Reifschneider <[EMAIL PROTECTED]> added the comment: > > Martin: What do you think of this patch? Looks fine. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1390> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
