On 6-Sep-07, at 12:13 PM, Yonik Seeley wrote:
On 9/6/07, Brian Carmalt <[EMAIL PROTECTED]> wrote:
Try it with title.encode('utf-8').
As in: kw =
{'id':'12','title':title.encode
('utf-8'),'system':'plone','url':'http://www.google.de'}
It seems like the client library should be responsible for encoding,
not the user.
So try changing
title="Übersicht"
into a unicode string via
title=u"Übersicht"
And that should hopefully get your test program working.
If it doesn't it's probably a solr.py bug and should be fixed there.
It may or may not, depending on the vagaries of the encoding in his
text editor.
What python gets when you enter u'é' is the byte sequence
corresponding to the encoding of your editor. For instance, my
terminal is set to utf-8 and when I type in é it is equivalent to
entering the bytes C3 A9:
In [5]: 'é'
Out[5]: '\xc3\xa9'
Prepending u does not work, because you are telling python that you
want these two bytes as unicode characters. Note that this could be
fixed by setting python's default encoding to match.
In [1]: u'é'
Out[1]: u'\xc3\xa9'
In [11]: print u'é'
é
The proper thing to do is to interpret the byte sequence given the
proper encoding:
'é'.decode('utf-8')
Out[3]: u'\xe9'
or enter the desired unicode character directly:
>>> u'\u00e9'
u'\xe9'
>>> print u'\u00e9'
é
This is less complicated in the usual case of reading data from a
file, because the encoding should be known (terminal encoding issues
are much trickier). Use codecs.open() to get a unicode-output text
stream.
-Mike