On 14-Jun-07, at 4:30 AM, vanderkerkoff wrote:
Hi Brian
I've now set the mysqldb to be default charset utf8, and everything
else is
utf8. collation etc etc.
I think I know what the problem is, and it's a really old one and I
feel
foolish now for not realising it earlier.
Our content people are copying and pasting sh*t from word into the
content.
:-)
Now that the database is utf8, I'd like to write something to
change the
crap from word into a readable value before it get's into the
database.
Using python, so I suppose this is more of a python question than a
solr
one.
Anyone got any tips anyway?
I've dealt with tons of issues with python and unicode, but I need
more information before proceeding with tips.
Specifically, what is the format of the "shit" being copied and
pasted into your app, and what python datatype is handling it? I
suspect it is encoded somehow, which could be problematic. Is it
going through a web browser? How is it getting into mysql?
-MIke