Hi
I have been trying to get a script to work on windows that works on mint. The
key blocker has been utf8 errors, most of which I have solved.
Now however the last error I am trying to overcome, the solution appears to be
to use the .decode('windows-1252') to correct an ascii error.
I am using lxml to read my content and decode is not supported are there any
known ways to read with lxml and fix unicode faults?
The key part of my script is
for content in roots:
utf8_parser = etree.XMLParser(encoding='utf-8')
fix_ascii = utf8_parser.decode('windows-1252')
mytree = etree.fromstring(
content.read().encode('utf-8'), parser=fix_ascii)
Without the added .decode my code looks like
for content in roots:
utf8_parser = etree.XMLParser(encoding='utf-8')
mytree = etree.fromstring(
content.read().encode('utf-8'), parser=utf8_parser)
However doing it in such a fashion returns this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid
start byte
Which I found this SO for http://stackoverflow.com/a/29217546/461887 but cannot
seem to implement with lxml.
Ideas?
Sayth
--
https://mail.python.org/mailman/listinfo/python-list