Re: solr.py problems with german "Umlaute"

Mike Klaas Thu, 06 Sep 2007 13:30:19 -0700


On 6-Sep-07, at 12:13 PM, Yonik Seeley wrote:

On 9/6/07, Brian Carmalt <[EMAIL PROTECTED]> wrote:

Try it with title.encode('utf-8').
As in: kw =
{'id':'12','title':title.encode('utf-8'),'system':'plone','url':'http://www.google.de'}


It seems like the client library should be responsible for encoding,
not the user.
So try changing
title="Übersicht"
  into a unicode string via
title=u"Übersicht"

And that should hopefully get your test program working.
If it doesn't it's probably a solr.py bug and should be fixed there.

It may or may not, depending on the vagaries of the encoding in histext editor.

What python gets when you enter u'é' is the byte sequencecorresponding to the encoding of your editor. For instance, myterminal is set to utf-8 and when I type in é it is equivalent toentering the bytes C3 A9:


In [5]: 'é'
Out[5]: '\xc3\xa9'

Prepending u does not work, because you are telling python that youwant these two bytes as unicode characters. Note that this could befixed by setting python's default encoding to match.


In [1]: u'é'
Out[1]: u'\xc3\xa9'
In [11]: print u'é'
Ã©

The proper thing to do is to interpret the byte sequence given theproper encoding:


'é'.decode('utf-8')
Out[3]: u'\xe9'

or enter the desired unicode character directly:

>>> u'\u00e9'
u'\xe9'
>>> print u'\u00e9'
é

This is less complicated in the usual case of reading data from afile, because the encoding should be known (terminal encoding issuesare much trickier). Use codecs.open() to get a unicode-output textstream.


-Mike

Re: solr.py problems with german "Umlaute"

Reply via email to