On 14-Jun-07, at 4:30 AM, vanderkerkoff wrote:


Hi Brian

I've now set the mysqldb to be default charset utf8, and everything else is
utf8.  collation etc etc.

I think I know what the problem is, and it's a really old one and I feel
foolish now for not realising it earlier.

Our content people are copying and pasting sh*t from word into the content.

:-)

Now that the database is utf8, I'd like to write something to change the crap from word into a readable value before it get's into the database. Using python, so I suppose this is more of a python question than a solr
one.

Anyone got any tips anyway?

I've dealt with tons of issues with python and unicode, but I need more information before proceeding with tips.

Specifically, what is the format of the "shit" being copied and pasted into your app, and what python datatype is handling it? I suspect it is encoded somehow, which could be problematic. Is it going through a web browser? How is it getting into mysql?

-MIke


Reply via email to