Thanks - it was exactly as you said
--- Kent Johnson <[EMAIL PROTECTED]> wrote:
> Most likely your XML file is 16-bit unicode, not
> utf-8. When ascii text
> is represented as unicode, every other byte will be
> a null byte. That is
> the extra character that shows up as a space or box
> depen
Ben Vinger wrote:
> Found the following (which solved the problem, though
> not on the console) at
> http://www.jorendorff.com/articles/unicode/python.html
>
> import codecs
> # Open a UTF-8 file in read mode
> infile = codecs.open("infile.txt", "r", "utf-8")
> # Read its contents as one l
Found the following (which solved the problem, though
not on the console) at
http://www.jorendorff.com/articles/unicode/python.html
import codecs
# Open a UTF-8 file in read mode
infile = codecs.open("infile.txt", "r", "utf-8")
# Read its contents as one large Unicode string.
text = infi