Re: [Tutor] close, but no cigar

Marc Tompkins Mon, 22 Jul 2013 13:49:00 -0700

On Mon, Jul 22, 2013 at 11:27 AM, Jim Mooney <cybervigila...@gmail.com>wrote:


> Okay, I'm getting there, but this should be translating A umlaut to an old
> DOS box character, according to my ASCII table, but instead it's print
> small 'u':
>
> def main():
>     zark = ''
>     for x in "ÀÄÄÄ":
>         print(unichr(ord(u'x')-3), end=' ')
>
> result: u u u u
>

When you type "Ä" in a Python string (without specifying which encoding
you're trying to represent), it doesn't necessarily have the same ordinal
value as the line-drawing character that gets mistakenly displayed as "Ä"
in your text editor.  Depending on which Python version you happen to be
using at the moment (and therefor depending on the default encoding), "Ä"
might be a Unicode Latin Capital Letter A With Diaeresis (U+00C4), or it
might be character code 0x8E, or it might be 0xC4...

For a quick visualization of what I'm talking about, just fire up the
Character Map program and find "Ä" in the following fonts: Arial, Terminal,
and Roman.  Float your mouse cursor over it each time to see the character
code associated with it.

If you insist on parsing the output of TREE (instead of letter Python do
things in a modern, Unicode-aware way), here's how I would do it:

    inFileName = "/Users/Marc/Desktop/rsp/tree.txt"
    with open(inFileName, 'r') as inFile:
        inString = inFile.read().decode('cp437')
        print inString

This printed out the line-drawing characters just fine; my test Cyrillic
filename remained a string of question marks, because TREE itself had
trashed that filename and there wasn't anything for .decode() to decode.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] close, but no cigar

Reply via email to