how to get size of unicode string/string in bytes ?

2006-08-01 Thread pattreeya
Hello,

  how can I get the number of byte of the string in python?
with "len(string)", it doesn't work to get the size of the string in
bytes if I have the unicode string but just the length. (it only works
fine for ascii/latin1) In data structure, I have to store unicode
string for many languages and must know exactly how big of my string
which is stored so I can read back later.

Many thanks for any suggestion.

cheers!
pattreeya.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to get size of unicode string/string in bytes ?

2006-08-01 Thread pattreeya
e.g. I use utf8 as encoding/decoding,
s = "ทดสอบ"
u = s.decode("utf-8")
how can I get size of u ?





[EMAIL PROTECTED] schrieb:

> Hello,
>
>   how can I get the number of byte of the string in python?
> with "len(string)", it doesn't work to get the size of the string in
> bytes if I have the unicode string but just the length. (it only works
> fine for ascii/latin1) In data structure, I have to store unicode
> string for many languages and must know exactly how big of my string
> which is stored so I can read back later.
> 
> Many thanks for any suggestion.
> 
> cheers!
> pattreeya.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how to get size of unicode string/string in bytes ?

2006-08-01 Thread pattreeya
I got the answer. What I need was so simple but I was blinded at that
moment.
Thanks for any suggestion!




f = open("test.csv", rb)
t1 = f.readline()
>>> t2 = t1.decode("iso-8859-9") # test with turkish
>>> t2
u'Dur-kalk trafi\u011fi, t\u0131kan\u0131kl\u0131k tehlikesi\n'
>>> print t2
Dur-kalk trafigi, tikaniklik tehlikesi

>>> len(t2)
39
>>> t2 = t1.decode("iso-8859-9")
>>> t2
u'Dur-kalk trafi\u011fi, t\u0131kan\u0131kl\u0131k tehlikesi\n'
>>> print t2
Dur-kalk trafigi, tikaniklik tehlikesi

>>> len(t2)
39
>>> u1 = t2.encode("utf-8")
>>> u1
'Dur-kalk trafi\xc4\x9fi, t\xc4\xb1kan\xc4\xb1kl\xc4\xb1k tehlikesi\n'
>>> print u1
Dur-kalk trafigi, tikaniklik tehlikesi

>>> len(u1)
43
>>>



Thnx!

-- 
http://mail.python.org/mailman/listinfo/python-list