[issue18850] xml.etree.ElementTree accepts control chars.

2013-09-02 Thread Eli Bendersky
Eli Bendersky added the comment: As Serhiy points out, this is a duplicate of #5166 -- superseder: -> ElementTree and minidom don't prevent creation of not well-formed XML ___ Python tracker _

[issue18850] xml.etree.ElementTree accepts control chars.

2013-09-02 Thread Eli Bendersky
Changes by Eli Bendersky : -- resolution: -> duplicate stage: needs patch -> committed/rejected status: open -> closed ___ Python tracker ___ ___

[issue18850] xml.etree.ElementTree accepts control chars.

2013-09-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Isn't this a duplicate of issue5166? -- ___ Python tracker ___ ___ Python-bugs-list mailing list U

[issue18850] xml.etree.ElementTree accepts control chars.

2013-09-01 Thread Eli Bendersky
Eli Bendersky added the comment: Can this be transformed into a new issue that succinctly summarizes what the new requested feature is, and why it's useful? -- ___ Python tracker __

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-29 Thread Stefan Behnel
Stefan Behnel added the comment: > As an advice I hope you do not take as insult, saying > "in section {section} the spec says {argument}" > is much more constructive than > "read the spec on that", "{extremely_obvious_link}", > at least to people not familiar with the spec and asking for the s

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-29 Thread Michele Orrù
Michele Orrù added the comment: > Is that you actual use case? That you *want* to store binary data in XML, > instead of getting it properly rejected as non well-formed content? No, Stefan. What I was saying in my last message was just "you're right, the user shall always use repr() when prin

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Stefan Behnel
Stefan Behnel added the comment: Is that you actual use case? That you *want* to store binary data in XML, instead of getting it properly rejected as non well-formed content? Then I suggest going the canonical route of passing it through base64 first, or any of the other binary-to-characters e

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Antoine Pitrou
Changes by Antoine Pitrou : -- nosy: +christian.heimes ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://m

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: Just pointed by a friend - I suppose this is insanely used to put binary blobs inside xml until "only the CDEnd string is recognized as markup". That's what I needed. Amen. -- _

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: >I said that (serialised) XML is defined as a sequence of bytes. > Read the spec on that. And I'm saying that's inexact. I have expectations that control chars are escaped in the serialized xml, because the spec I'm reading says so, and because the documentation

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Stefan Behnel
Stefan Behnel added the comment: We are talking about two different things here. I said that (serialised) XML is defined as a sequence of bytes. Read the spec on that. What you are talking about is the Infoset, or the parsed/generated in-memory XML tree. That's obviously not bytes, it's defin

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: Does not seem to me just a "byte string" where you can put binary data. Hence, I expect the xml tree to escape/reject those. Hence, Is not an enhancement, but a bug. Unless we just want to document this. (not going to change the metadata, otherwise we'll end up

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: """ Document authors are encouraged to avoid "compatibility characters", as defined in section 2.3 of [Unicode]. The characters defined in the following ranges are also discouraged. They are either control characters or permanently undefined Unicode characters:

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: >>> XML is *defined* as a stream of bytes. >> Can you *paste* the *source* proving what you are arguing, please? > http://www.w3.org/TR/REC-xml/ """ The first two suggestions are directly derived from the rules given for identifiers in Standard Annex #31 (UAX #31)

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Stefan Behnel
Stefan Behnel added the comment: >> XML is *defined* as a stream of bytes. > Can you *paste* the *source* proving what you are arguing, please? http://www.w3.org/TR/REC-xml/ > python3 works with ElementTree(bytes(unicode)) What does this sentence mean? -- ___

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: > XML is *defined* as a stream of bytes. Can you *paste* the *source* proving what you are arguing, please? > Regarding the API side in ElementTree, Py2 accepts byte strings and Py3 > requires Unicode strings. "accepts"? python3 works with ElementTree(bytes(uni

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: > Incidentally I read today > http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html > mentioning ^A being used. > Maybe that would stop working? I don't see any problem in any xml output. Indeed: "You can't put a nasty non-printing ASC

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Stefan Behnel
Stefan Behnel added the comment: > I think the point here is clarifying whether xml expect text or just a byte > string. In case that's a stream of byte, I agree with you, is more a > "behaviour" problem. XML is *defined* as a stream of bytes. Regarding the API side in ElementTree, Py2 accept

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-28 Thread Michele Orrù
Michele Orrù added the comment: > The parser *is* rejecting control characters. It's an XML parser. See the > example in the link you posted. Ehrm, my apologies. > That's not an XML specific issue. You are printing a byte string here, so > repr() would be the right thing to use (and is actuall

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Martin Mokrejs
Martin Mokrejs added the comment: Incidentally I read today http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html mentioning ^A being used. Maybe that would stop working? -- nosy: +mmokrejs ___ Python tracker

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Stefan Behnel
Stefan Behnel added the comment: Or maybe even to "enhancement". The behaviour that it writes out what you give it isn't exactly wrong, it's just inconvenient that you have to take care yourself that you pass it well-formed XML content. -- ___ Pytho

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Stefan Behnel
Stefan Behnel added the comment: > The parser is *not* rejecting control chars. The parser *is* rejecting control characters. It's an XML parser. See the example in the link you posted. > assume you have a script that simply stores each message it receives (from > stdin, from a tcp stream, w

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Michele Orrù
Michele Orrù added the comment: > Michele, could you elaborate how you would exploit this issue as a security > risk? Sure. What I meant in my message is: assume you have a script that simply stores each message it receives (from stdin, from a tcp stream, whatever) inside an xml tree like '{m

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Stefan Behnel
Stefan Behnel added the comment: Michele, could you elaborate how you would exploit this issue as a security risk? I mean, I can easily create a (non-)XML-document with control characters manually, and the parser would reject it. What part of the create-to-serialise process exactly is a probl

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: See also issue7727. Almost any other XML generation code (xml.sax.sautils.XMLGenerator, xml.dom.minidom.Element.writexml(), etc, but not plistlib.PlistWriter) has the same problem. The problem with filtering control characters is that it will significantly

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread R. David Murray
R. David Murray added the comment: In that case, the fix needs to be applied to 3.2 and 2.6 as well. Or at least considered for application. It could be that this will break working (though dangerous) programs. I'll leave it to folks more knowledgeable in this particular area than I to deci

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Michele Orrù
Michele Orrù added the comment: I suppose it is, David, if in 2 minutes flat I can change your terminal name. -- Added file: http://bugs.python.org/file31484/inject.py ___ Python tracker ___

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread R. David Murray
R. David Murray added the comment: Unless it is a security issue, this seems like the kind of fix that shouldn't be applied to maintenance releases. -- nosy: +r.david.murray ___ Python tracker

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Stefan Behnel
Stefan Behnel added the comment: Go for it. That's usually the fastest way to get things done. -- ___ Python tracker ___ ___ Python-bu

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Michele Orrù
Michele Orrù added the comment: you mind if I try by myself to provide patch and unittest in the next few days? -- ___ Python tracker ___

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Stefan Behnel
Stefan Behnel added the comment: This is a bit tricky in ET because it generally allows you to stick anything into the Element properties (and that's a feature). So catching this at tree building time (as lxml.etree does) isn't really possible. However, at least catching it in the serialiser s

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Serhiy Storchaka
Changes by Serhiy Storchaka : -- nosy: +eli.bendersky, scoder, serhiy.storchaka versions: +Python 2.7, Python 3.3, Python 3.4 ___ Python tracker ___ _

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Michele Orrù
Changes by Michele Orrù : -- components: +Library (Lib), XML type: -> behavior ___ Python tracker ___ ___ Python-bugs-list mailing li

[issue18850] xml.etree.ElementTree accepts control chars.

2013-08-27 Thread Michele Orrù
New submission from Michele Orrù: Got from irc; python bug in xml.etree.ElementTree, from version 2.7 to 3.2 http://www.reddit.com/r/Python/comments/1l6cta/python_bug_in_xmletreeelementtree/ I think we should keep consistency with lxml and forbid control chars in advance.