> The example is written assuming the console encoding is utf-8. Yours > seems to be cp437. Try this:
> In [1]: import sys > > In [2]: sys.stdout.encoding > Out[2]: 'cp437' That is indeed the result that I get as well. > But there is another problem - \u2013 is an em dash which does not > appear in cp437, so even giving the correct encoding doesn't work. Try > this: > In [6]: x = u"abc\u2591" > > In [7]: print x.encode('cp437') > ------> print(x.encode('cp437')) > abcâ–‘ > So does this mean that my python install is incapable of encoding the en/em dash? For the time being, I've gone with treating the symptom rather than the root problem and created a translate function. def translate_code(text): text = text.replace("‘","'") text = text.replace("’","'") text = text.replace("“",'"') text = text.replace("”",'"') text = text.replace("–","-") text = text.replace("—","--") return text Which of course has led to a new problem. I'm first using Fredrik Lundh's code to extract random html gobbledygook, then running my translate function over the file to replace the windows-1252 encoded characters. But for some reason, I can't seem to get my translate_code function to work inside the same loop as Mr. Lundh's html cleanup code. Below is the problem code: infile = open('test.txt','rb') outfile = open('test_cleaned.txt','wb') for line in infile: try: newline = strip_html(line) cleanline = translate_code(newline) outfile.write(cleanline) except: newline = "NOT CLEANED: %s" % line outfile.write(newline) infile.close() outfile.close() The strip_html function, documented here (http://effbot.org/zone/re-sub.htm#unescape-html ), returns a text string as far as I can tell. I'm confused why I wouldn't be able to further manipulate the string with the "translate_code" function and store the result in the "cleanline" variable. When I try this approach, none of the translations succeed and I'm left with the same HTML gook in the "outfile". Is there some way to combine these functions so I can perform all the processing in one pass? _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor