[issue5752] xml.dom.minidom does not escape CR, LF and TAB characters within attribute values

2009-07-18 Thread Tomalak

Tomalak  added the comment:

@devon: Thanks for pointing & linking back here.

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-04-14 Thread Tomalak

New submission from Tomalak :

Current behavior upon toxml() is:



Upon reading the document again, the new line is normalized and
collapsed into a space (according to the XML spec, section 3.3.3), which
means that it is lost.

Better behavior would be something like this (within attribute values only):



--
components: XML
messages: 85964
nosy: Tomalak
severity: normal
status: open
title: xml.dom.minidom does not handle newline characters in attribute values
versions: Python 2.4, Python 2.5, Python 2.6, Python 2.7, Python 3.0, Python 3.1

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-04-14 Thread Tomalak

Changes by Tomalak :


--
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-06 Thread Tomalak

Tomalak  added the comment:

@Francesco Sechi: Would it not just require a minimal change to the
_write_data() method? Something along the lines of (sorry, no Python
expert, maybe I am way off):

def _write_data(writer, data, is_attrib=False):
"Writes datachars to writer."
if is_attrib: 
data = data.replace("\r", "
").replace("\n", "
")
data = data.replace("&", "&").replace("<", "<")
data = data.replace("\"", """).replace(">", ">")
writer.write(data)

and in Element.writexml():

#[...]
for a_name in a_names:
writer.write(" %s=\"" % a_name)
_write_data(writer, attrs[a_name].value, True)
#[...]

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-06 Thread Tomalak

Tomalak  added the comment:

Of course it should be:

def _write_data(writer, data, is_attrib=False):
"Writes datachars to writer."
data = data.replace("&", "&").replace("<", "<")
data = data.replace("\"", """).replace(">", ">")
if is_attrib: 
data = data.replace("\r", "
").replace("\n", "
")
writer.write(data)

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Tomalak  added the comment:

Hmm... I thought toxml() is the part that needs to be fixed, not the
parsing/reading. I mentioned the reading only to outline the data loss
that occurs eventually.

My point is: The toxml() (i.e. _write_data) *actually writes* the
newline to the output. And within parameters, it just shouldn't.

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Tomalak  added the comment:

Attaching a patch that fixes the problem.

--
keywords: +patch
Added file: http://bugs.python.org/file13919/minidom.patch

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Tomalak  added the comment:

Attaching a test file that outlines the problem. Output on my system
(Windows / Python 3.0) is:

Without the patch:
C:\Python30>python.exe c:\minidom_test.py
False
1 -->"multiline
value"
2 -->"multiline value"

With the patch:
C:\Python30>python.exe c:\minidom_test.py
True
1 -->"multiline
value"
2 -->"multiline
value"

--
Added file: http://bugs.python.org/file13920/toxml_test.py

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Changes by Tomalak :


Removed file: http://bugs.python.org/file13920/toxml_test.py

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-08 Thread Tomalak

Changes by Tomalak :


Added file: http://bugs.python.org/file13921/minidom_test.py

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not handle newline characters in attribute values

2009-05-10 Thread Tomalak

Tomalak  added the comment:

Francesco, I think you are missing the point. :-) The problem has two sides.

If I create an XML document using the DOM (not by parsing it from a
string!), then I can put newline characters into attribute value. This
is allowed and conforms to the XML spec. 

However, *literal* newlines in an attribute value (i.e. when the
document is parsed from a string) have no meaning. The parser treats
them as if they were insignificant whitespace -- they are converted to a
single space. This is also valid and conforms to the XML spec.

The catch: This leads to an actual data loss if I *wanted* to store
newline characters in an attribute -- unless the newline characters are
properly encoded. Encoding the newline characters is also valid and
conforms to the spec, so the DOM implementation should do it. 

In other words - the parsing process you refer to is actually working
fine. If an attribute contains a literal newline, it is indeed okay to
collapse it into a space. It's only the document serializing that is broken.

Minidom is clearly missing functionality here, and it does not conform
to the XML spec. If I store a string of data in an XML document, it must
be ensured that upon reading the document again, I get the *same* data
back. This is what I check with my test script.

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape newline characters within attribute values

2009-05-10 Thread Tomalak

Changes by Tomalak :


--
title: xml.dom.minidom does not handle newline characters in attribute values 
-> xml.dom.minidom does not escape newline characters within attribute values

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape newline characters within attribute values

2009-05-13 Thread Tomalak

Tomalak  added the comment:

Francesco,

> if you want to encode the newline character, 
> this should be done by both parseString and 
> setAttribute methods. Otherwise, the 
> behaviour is not symmetric.

I believe you still don't see the issue. The behaviour is not symmetric
*now*. You store a '\n' in an attribute value with setAttribute(), save
the document to XML, load it again and out comes a space where the '\n'
should have been.

The point is that parseString() behaves correctly, but serializing does
not. There is only one side to fix, because only one side is broken.

> If you want to encode the newline in different 
> manner, you should develop a patch that
> introduces this kind of encoding in both 
> parseString and setAttribute methods.

It would be pointless to do the encoding in setAttribute(). The valid
ways to XML-encode a '\n' character are '
', '
' or '
'. Doing
so in setAttribute() would produce doubly encoded output, like this:
'&#10'. This is even more wrong.

However, if parseString() encounters a '
' in the input, it correctly
translates this to '\n' in the DOM. As I said, there is nothing to fix
in parsing, this exercise is about getting minidom to actually *output*
a '
' where appropriate. :-)

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape newline characters within attribute values

2009-05-13 Thread Tomalak

Tomalak  added the comment:

Daniel Diniz: 

The proposed behaviour is correct:
http://www.w3.org/TR/2000/WD-xml-c14n-2119.html#charescaping

"In attribute values, the character information items 
TAB (#x9), newline (#xA), and carriage-return (#xD) 
are represented by "	", "
", and "
" respectively."

Since the behaviour is correct, it is also desirable. :-)

I don't think that this change could cause existing solution to break
since the current inconsistency in handling these characters make it
impossible to rely on this anyway.

Thanks for putting up the unit test diff.

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape newline characters within attribute values

2009-05-13 Thread Tomalak

Changes by Tomalak :


Removed file: http://bugs.python.org/file13919/minidom.patch

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape newline characters within attribute values

2009-05-13 Thread Tomalak

Changes by Tomalak :


Added file: http://bugs.python.org/file13977/minidom.patch

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape newline characters within attribute values

2009-05-13 Thread Tomalak

Tomalak  added the comment:

I changed the patch to include support for TAB characters, which were
also left unencoded before.

Also I switched encoding from '
' etc. to '
'. This is
equivalent, but the spec uses the latter variant.

--

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5752] xml.dom.minidom does not escape CR, LF and TAB characters within attribute values

2009-05-13 Thread Tomalak

Changes by Tomalak :


--
title: xml.dom.minidom does not escape newline characters within attribute 
values -> xml.dom.minidom does not escape CR, LF and TAB characters within 
attribute values

___
Python tracker 
<http://bugs.python.org/issue5752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com