Re: Problem with lower() for unicode strings in russian

2008-10-05 Thread Alexey Moskvin
Martin, thanks for fast reply, now anything is ok!
On Oct 6, 1:30 am, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I have a set of strings (all letters are capitalized) at utf-8,
>
> That's the problem. If these are really utf-8 encoded byte strings,
> then .lower likely won't work. It uses the C library's tolower API,
> which works on a byte level, i.e. can't work for multi-byte encodings.
>
> What you need to do is to operate on Unicode strings. I.e. instead
> of
>
>   s.lower()
>
> do
>
>   s.decode("utf-8").lower()
>
> or (if you need byte strings back)
>
>   s.decode("utf-8").lower().encode("utf-8")
>
> If you find that you write the latter, I recommend that you redesign
> your application. Don't use byte strings to represent text, but use
> Unicode strings all the time, except at the system boundary (where
> you decode/encode as appropriate).
>
> There are some limitations with Unicode .lower also, but I don't
> think they apply to Russian (specifically, SpecialCasing.txt is
> not considered).
>
> HTH,
> Martin

--
http://mail.python.org/mailman/listinfo/python-list


Problem with lower() for unicode strings in russian

2008-10-05 Thread Alexey Moskvin
Hi!
I have a set of strings (all letters are capitalized) at utf-8,
russian language. I need to lower it, but
my_string.lower(). Doesn't work.
See sample script:
# -*- coding: utf-8 -*-
[skip]
s1 = self.title
s2 = self.title.lower()
print s1 == s2

returns true.
I have no problems with lower() for english letters:, or with
something like this:
u'russian_letters_here'.lower(), but I don't need constants, I need to
modify variables, but there is no any changs, when I apply lower()
function to mine strings.
--
http://mail.python.org/mailman/listinfo/python-list