On 23/05/13 04:14, Citizen Kant wrote:
Does anybody know if there's a Python method that gives or stores the
complete list of ascii characters or unicode characters? The list of every
single character available would be perfect.


There are only 127 ASCII characters, so getting a list of them is trivial:

ascii = map(chr, range(128))  # Python 2
ascii = list(map(chr, range(128)))  # Python 3


or if you prefer a string:

ascii = ''.join(map(chr, range(128)))


If you don't like map(), you can use a list comprehension:

[chr(i) for i in range(128)]

The string module already defines some useful subsets of them:

py> import string
py> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~
 \t\n\r\x0b\x0c'


There are 1114111 (hexadecimal 0x10FFFF) possible Unicode code-points, but most 
of them are currently unassigned. Of those that are assigned, many of them are 
reserved as non-characters or for special purposes, and even those which are 
assigned, most fonts do not display anything even close to the full range of 
Unicode characters.

If you spend some time on the Unicode web site, you will find lists of 
characters which are defined:

www.unicode.org

but beware, it is relatively heavy going. Wikipedia has a page showing all 
currently assigned characters, but font support is still lousy and many of them 
display as boxes:

http://en.wikipedia.org/wiki/List_of_Unicode_characters

You can generate the entire list yourself, using the same technique as for 
ASCII above:


# Python 2:
unicode = ''.join(map(unichr, xrange(1114112)))

# Python3:
unicode = ''.join(map(chr, range(1114112)))


although it will take a few seconds to generate the entire range. You can then 
get the name for each one using something like this:

import unicodedata
for c in unicode:
    try:
        print(c, unicodedata.name(c))
    except ValueError:
        # unassigned, or a reserved non-character
        pass


but remember that there are currently almost 100,000 defined characters in 
Unicode, and your terminal will probably not be able to print most of them. 
Expect to see a lot of boxes.




--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to