Hi, How do you post your data to solr? If it's by posting XML, then it should be properly encoded in UTF-8 (which is the XML default). Regardless of what's in the DB (which can be a mystery with MySQL).
At query time, if the XML writer is used, then it's encoded in UTF-8. If the json one is used, I think it's the same. Because json is unicode compliant by nature (javascript). According to what you say, I would bet for a PHP problem. It seems PHP takes the correct UTF8 octets from solr and displays them as latin1 encoding (hence the strange characters). You need to - either output your pages in UTF-8 - or decode the octets given by solr to a unicode string and let it be encoded as latin1 for output (with the risk of loosing non-latin1 encodable characters). I hope it helps. J. 2009/11/4 Jonathan Hendler <jonathan.hend...@gmail.com>: > Hi Peter, > > I have the same set of issues and will look for a response here. > > Sometimes those other chars can be create at the time of input (like > extraction from a Microsoft Office doc from third part tool for example). > But MySQL looking OK in the browser might be because the encoding of MySQL > was not the same as the original text. Say for example that the collation of > MySQL is Latin, and the document was UTF-8. When a browser renders, it might > assume chars are UTF-8, but SOLR might be taking the table type literally in > the DIH (Latin1 Swedish for example). Could also be the way PHP doesn't > handle UTF-8 well and it depends on your client. > > Don't think it has anything to do with Jetty - I use Resin. > > Hope that helps, > > - Jonathan > > > On Nov 4, 2009, at 8:48 AM, Peter Hedlund wrote: > >> I'm having a problem with character encoding. The data that I'm indexing >> with SOLR is being pulled from a MySQL database and then the index is being >> integrated into a PHP application. When I display the text from the SOLR >> index it's full of strange characters (–, é, etc...). However, when I >> bypass SOLR and access the data from the MySQL table directly and write to >> the browser I don't see any problems with em-dashes and accented characters. >> >> Is this a JETTY issue or a SOLR issue or something else? (It's not simply >> an issue of including <meta http-equiv="Content-Type" >> content="text/html;charset=UTF-8"> either) >> >> Thanks for any help. >> >> Peter Hedlund >> >> > > -- Jerome Eteve. http://www.eteve.net jer...@eteve.net