[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
New submission from Takeshi Matsuyama : When I make a dictionary by parsing "legacy-icon-mapping.xml"(which is a part of icon-naming-utils[http://tango.freedesktop.org/Tango_Icon_Library]) with the following script, the three keys of the dictionary are collapsed if the "buffer_text" attribute is False. = #!/usr/bin/env python # -*- coding: utf-8 -*- from __future__ import with_statement import sys from xml.parsers.expat import ParserCreate import codecs class Database: """Make a dictionary which is accessible by Databese.dict""" def __init__(self, buffer_text): self.cnt = None self.name = None self.data = None self.dict = {} p = ParserCreate() p.buffer_text = buffer_text p.StartElementHandler = self.start_element p.EndElementHandler = self.end_element p.CharacterDataHandler = self.char_data with open("/usr/share/icon-naming-utils/legacy-icon-mapping.xml", 'r') as f: p.ParseFile(f) def start_element(self, name, attrs): if name == 'context': self.cnt = attrs["dir"] if name == 'icon': self.name = attrs["name"] def end_element(self, name): if name == 'link': self.dict[self.data] = (self.cnt, self.name) def char_data(self, data): self.data = data.strip() def print_set(aset): for e in aset: print '\t' + e if __name__ == '__main__': sys.stdout = codecs.getwriter('utf_8')(sys.stdout) map_false_dict = Database(False).dict map_true_dict = Database(True).dict print "The keys which exist if buffer_text=False but don't exist if buffer_text=True are" print_set(set(map_false_dict.keys()) - set(map_true_dict.keys())) print "The keys which exist if buffer_text=True but don't exist if buffer_text=False are" print_set(set(map_true_dict.keys()) - set(map_false_dict.keys())) = The result of running this script is == The keys which exist if buffer_text=False but don't exist if buffer_text=True are rt-descending ock_text_right lc The keys which exist if buffer_text=True but don't exist if buffer_text=False are stock_text_right gnome-mime-application-vnd.stardivision.calc gtk-sort-descending == I confirmed it in Python-2.5.2 on Fedora 10. -- components: XML messages: 80398 nosy: tksmashiw severity: normal status: open title: xml.parsers.expat make a dictionary which keys are broken if buffer_text is False. type: behavior versions: Python 2.5 ___ Python tracker <http://bugs.python.org/issue5036> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
Takeshi Matsuyama added the comment: Thanks for reply! >If the xml file is small enough, could you attach it to the issue? Or >provide a download location? Sorry, I found here. http://webcvs.freedesktop.org/icon-theme/icon-naming-utils/legacy-icon-mapping.xml?revision=1.75&content-type=text%2Fplain&pathrev=1.75 >(Note that Python 2.5 only gets security fixes now, so unless this >still fails with 2.6 or later, this issue is likely to be closed) I roughly confirmed the same problem on python-3.0 on MS Windows 2 weeks ago, but need to verify more strictly... ___ Python tracker <http://bugs.python.org/issue5036> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
Takeshi Matsuyama added the comment: Hi kawai. I got correct output by modifying the code like you say, but I still cannot understand why this happens. Could you tell me more briefly, or point any documents about it? I can't find any notes which say don't pass strings but append it for CharacterDataHandler in official documents. Does everyone know/understand it already? Only I am so stupid? (;;) ___ Python tracker <http://bugs.python.org/issue5036> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
Takeshi Matsuyama added the comment: a mistake of my former message, briefly -> in detail >Please read "The ContentHandler.characters() callback is missing data!" >http://www.saxproject.org/faq.html I was just reading above site. it is now very clear for me. Thanks kawai and I'm sorry to take up your time, gagenellina. ___ Python tracker <http://bugs.python.org/issue5036> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
Takeshi Matsuyama added the comment: >From msg80438 >You should reset it by self.data = '' at end_element(). It seems that we should reset it at start_element() like this, def start_element(self, name, attrs): ...abbr... if name == 'link': self.data = '' = or unwanted \s, \t, and \n mix in "self.data". That's all, thanks. ___ Python tracker <http://bugs.python.org/issue5036> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5036] xml.parsers.expat make a dictionary which keys are broken if buffer_text is False.
Takeshi Matsuyama added the comment: Could someone close this? ___ Python tracker <http://bugs.python.org/issue5036> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com