On Mon, Jul 22, 2013 at 11:27 AM, Jim Mooney <cybervigila...@gmail.com>wrote:
> Okay, I'm getting there, but this should be translating A umlaut to an old > DOS box character, according to my ASCII table, but instead it's print > small 'u': > > def main(): > zark = '' > for x in "ÀÄÄÄ": > print(unichr(ord(u'x')-3), end=' ') > > result: u u u u > When you type "Ä" in a Python string (without specifying which encoding you're trying to represent), it doesn't necessarily have the same ordinal value as the line-drawing character that gets mistakenly displayed as "Ä" in your text editor. Depending on which Python version you happen to be using at the moment (and therefor depending on the default encoding), "Ä" might be a Unicode Latin Capital Letter A With Diaeresis (U+00C4), or it might be character code 0x8E, or it might be 0xC4... For a quick visualization of what I'm talking about, just fire up the Character Map program and find "Ä" in the following fonts: Arial, Terminal, and Roman. Float your mouse cursor over it each time to see the character code associated with it. If you insist on parsing the output of TREE (instead of letter Python do things in a modern, Unicode-aware way), here's how I would do it: inFileName = "/Users/Marc/Desktop/rsp/tree.txt" with open(inFileName, 'r') as inFile: inString = inFile.read().decode('cp437') print inString This printed out the line-drawing characters just fine; my test Cyrillic filename remained a string of question marks, because TREE itself had trashed that filename and there wasn't anything for .decode() to decode.
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor