On 6-Sep-07, at 12:13 PM, Yonik Seeley wrote:

On 9/6/07, Brian Carmalt <[EMAIL PROTECTED]> wrote:
Try it with title.encode('utf-8').
As in: kw =
{'id':'12','title':title.encode ('utf-8'),'system':'plone','url':'http://www.google.de'}

It seems like the client library should be responsible for encoding,
not the user.
So try changing
title="Übersicht"
  into a unicode string via
title=u"Übersicht"

And that should hopefully get your test program working.
If it doesn't it's probably a solr.py bug and should be fixed there.

It may or may not, depending on the vagaries of the encoding in his text editor.

What python gets when you enter u'é' is the byte sequence corresponding to the encoding of your editor. For instance, my terminal is set to utf-8 and when I type in é it is equivalent to entering the bytes C3 A9:

In [5]: 'é'
Out[5]: '\xc3\xa9'

Prepending u does not work, because you are telling python that you want these two bytes as unicode characters. Note that this could be fixed by setting python's default encoding to match.

In [1]: u'é'
Out[1]: u'\xc3\xa9'
In [11]: print u'é'
é

The proper thing to do is to interpret the byte sequence given the proper encoding:

'é'.decode('utf-8')
Out[3]: u'\xe9'

or enter the desired unicode character directly:

>>> u'\u00e9'
u'\xe9'
>>> print u'\u00e9'
é

This is less complicated in the usual case of reading data from a file, because the encoding should be known (terminal encoding issues are much trickier). Use codecs.open() to get a unicode-output text stream.

-Mike 

Reply via email to