Prasad, Ramit wrote:
I don't know what they are from but they are both the same value,
one in hex and one in octal.

0xC9 == 0311

As for the encoding mechanisms I'm afraid I can't help there!

Nice catch! Yeah, I am stuck on the encoding mechanism as well. I
know how to encode/decode...but not what encoding to use. Is there a
reference that I can look up to find what encoding that would
correspond to? I know what the character looks like if that helps. I
know that Python does display the correct character sometimes, but
not sure when or why.

In general, no. The same byte value (0xC9) could correspond to many different encodings. In general, you *must* know what the encoding is in order to tell how to decode the bytes.

Think about it this way... if I gave you a block of data as hex bytes:

240F91BC03...FF90120078CD45

and then asked you whether that was a bitmap image or a sound file or something else, how could you tell? It's just *bytes*, it could be anything.

All is not quite lost though. You could try decoding the bytes and see what you get, and see if it makes sense. Start with ASCII, Latin-1, UTF-8, UTF-16 and any other encodings in common use. (This would be like pretending the bytes were a bitmap, and looking at it, and trying to decide whether it looked like an actual picture or like a bunch of random pixels. Hopefully it wasn't meant to look like a bunch of random pixels.)

Web browsers such as Internet Explorer and Mozilla will try to guess the encoding by doing frequency analysis of the bytes. Mozilla's encoding guesser has been ported to Python:

http://chardet.feedparser.org/

But any sort of guessing algorithm is just a nasty hack. You are always better off ensuring that you accurately know the encoding.


--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to