[Tutor] XML parsing when elements contain foreign characters

Garry Bettle Thu, 09 Jan 2014 00:52:35 -0800

Howdy all,

Have you hear the news? Happy New Year!


Hope someone can help. I know this is a tutor list so please feel free to
send me somewhere else.

I'm trying to parse some XML and I'm struggling to reference elements that
contain foreign characters.

Code so far:

# -*- coding: utf-8 -*-

from xml.dom import minidom

xmldoc = minidom.parse('Export.xml')
products = xmldoc.getElementsByTagName('product')
print '%s Products' % len(products)

row_cnt = 0
titles = {}
stocklevel = {}
for product in products:
  row_cnt+=1
  title=product.getElementsByTagName('Titel')[0].firstChild.nodeValue
  stock=product.getElementsByTagName('AntalPåLager')[0].firstChild.nodeValue
  if title not in titles:
    titles[title]=1
  else:
    titles[title]+=1
  if stock not in stocklevel:
    stocklevel[stock]=1
  else:
    stocklevel[stock]+=1

Traceback (most recent call last):
  File "C:\Python27\Testing Zizzi.py", line 16, in <module>

stock=product.getElementsByTagName('AntalPÃ¥Lager')[0].firstChild.nodeValue
IndexError: list index out of range

I've tried to encode the string before giving it to getElementsByTagName
but no joy.

Any ideas?

Many thanks!

Cheers,

Garry

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] XML parsing when elements contain foreign characters

Reply via email to