Kent Johnson wrote: > Barnaby Scott wrote: >> Can anyone explain the following: I was getting string.uppercase >> returning an unexpected number of characters, given that the Python >> Help says that it should normally be A-Z. Being locale-dependent, I >> checked that my locale was not set to something exotic, and sure >> enough it is only what I expected - see below: >> >> >> IDLE 1.1 ==== No Subprocess ==== >> >>> import locale, string >> >>> locale.getlocale() >> ['English_United Kingdom', '1252'] >> >>> print string.uppercase >> ABCDEFGHIJKLMNOPQRSTUVWXYZŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ >> >>> print string.lowercase >> abcdefghijklmnopqrstuvwxyzƒšœžßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ >> >>> >> >> What am I missing here? Surely for UK English, I really should just be >> getting A-Z and a-z. In case it is relevant, the platform is Windows >> 2000. > > Interesting. Here is what I get: > >>> import locale, string > >>> locale.getlocale() > (None, None) > >>> string.uppercase > 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' > > Somehow the locale for your system has changed from the 'C' locale. If I > set the default locale I get similar results to yours: > >>> locale.setlocale(locale.LC_ALL, '') > 'English_United States.1252' > >>> locale.getlocale() > ('English_United States', '1252') > >>> print string.uppercase > ABCDEFGHIJKLMNOPQRSTUVWXYZèîă└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╪┘┌█▄▌▐ > > which doesn't print correctly because my console encoding is actually > cp437 not cp1252. > > It looks like string.uppercase is giving you all the characters which > are uppercase in the current encoding, which seems reasonable. You can > use string.ascii_uppercase if you want just A-Z. > > Kent > Thanks, but this raises various questions:
Why would my locale have 'changed' - and from what? What *would* be the appropriate locale given that I am in the UK and use English, and how would I set it? Why on earth does the ['English_United Kingdom', '1252'] locale setting consider ŠŒŽŸÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ to be appropriate? Is this less to do with Python than the operating system? Where can I read more on the subject? Sorry for all the open-ended questions, but I am baffled by this and can find no information. Sadly, just using string.ascii_uppercase is not a solution because I am trying to develop something for different locales, but only want the actual letters that a particular language uses to be returned - e.g. English should be A-Z only, Swedish should be A-Z + ÅÄÖ (only) etc. The thing I really want to avoid is having to hard-code for every language on the planet - surely this is the whole point of locale settings, and locale-dependent functions and constants? Thanks Barnaby Scott _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor