On 14/08/13 02:49, Marc Tompkins wrote:
On Tue, Aug 13, 2013 at 8:59 AM, Amit Saha <amitsaha...@gmail.com> wrote:
What does it mean (and will it always work?) when I don't specify any
encoding:
bytearray(ssid).decode()
u'BigPond679D85'
If you don't specify an encoding, then the default encoding is used; as you
point out a bit later, your local default is ascii.
Will it always work? NO. If there are any characters in the input stream
(the SSID in this case), .decode will fail (probably with
UnicodeDecodeError, but I can't test it at the moment.)
Careful -- you are confusing two distinct concepts here. ssid does not contain
characters. It contains bytes. There are exactly 256 possible bytes, which are numbers 0,
1, ... 255. They may *represent* characters, or sounds, or images, or motion video, or
any other form of data you like, in which case you have to ask (e.g.) "how is the
sound encoded into bytes? is it a WAV file, or MP3, or OGG, or something else?"
In this case, the ssid represents characters, but it contains bytes, and the
same question applies -- how are the characters A, B, C, ... encoded into
bytes? Unless you know which encoding is used, you have to guess. If you guess
wrong, you'll get errors. If you're lucky you will get an exception, and know
that you guessed wrong, but if you're unlucky you'll just get garbage
characters.
Fortunately, there are a couple of decent guesses you can make which will often
be correct, at least in Western European countries, Australia, the USA, and
similar:
UTF-8
ASCII
Latin-1
Latin-1 should be considered the "last resort" encoding, since it will never
fail. But it can return garbage. UTF-8 should be considered your first guess, since it is
the standard encoding that everyone should use. (Any application that doesn't use UTF-8
by default in the 21st century is, in my opinion, buggy.)
I don't know the WiFi spec well enough to know whether you're ever going to
run into non-ASCII characters in an SSID;
A little bit of googling shows that it definitely happens, and that UTF-8 is
the standard encoding to use.
--
Steven
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor