2009/6/18 Serdar Tumgoren <zstumgo...@gmail.com>: >> In [7]: print x.encode('cp437') >> ------> print(x.encode('cp437')) >> abc░ >> > So does this mean that my python install is incapable of encoding the > en/em dash?
No, the problem is with the print, not the encoding. Your console, as configured, is incapable of displaying the em dash. > But for some reason, I can't seem to get my translate_code function to > work inside the same loop as Mr. Lundh's html cleanup code. Below is > the problem code: > > infile = open('test.txt','rb') > outfile = open('test_cleaned.txt','wb') > > for line in infile: > try: > newline = strip_html(line) > cleanline = translate_code(newline) > outfile.write(cleanline) > except: > newline = "NOT CLEANED: %s" % line > outfile.write(newline) > > infile.close() > outfile.close() > > The strip_html function, documented here > (http://effbot.org/zone/re-sub.htm#unescape-html ), returns a text > string as far as I can tell. I'm confused why I wouldn't be able to > further manipulate the string with the "translate_code" function and > store the result in the "cleanline" variable. When I try this > approach, none of the translations succeed and I'm left with the same > HTML gook in the "outfile". Your try/except is hiding the problem. What happens if you take it out? what error do you get? My guess is that strip_html() is returning unicode and translate_code() is expecting strings but I'm not sure without seeing the error. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor