This is what i see in your original email... >>> I am attempting to import documents to Solr from MySQL using DIH. One >>> of the field contains the text - =E2=80=9CFuture of Mobile Value Added >>> Service=s (VAS) in Australia=E2=80=9D .Notice the character =E2=80=9C >>> and =E2=80=9D.
"E2 80 9C" and "E2 80 9D" are a classic symptom of Windows-1252 "smart quotes" being interpreted as UTF8... http://www.i18nqa.com/debug/utf8-debug.html https://en.wikipedia.org/wiki/Windows-1252 So i'm pretty sure the root of your problem is that your source data is messed up. : The output of Show variables goes like this. I have verified with the hex : values and they are different in MySQL and Solr. : : | Variable_name | Value | : +--------------------------+----------------------------+ : | character_set_client | latin1 | : | character_set_connection | latin1 | : | character_set_database | latin1 | : | character_set_filesystem | binary | : | character_set_results | latin1 | : | character_set_server | latin1 | : | character_set_system | utf8 | : | character_sets_dir | /usr/share/mysql/charsets/ : : : : *Pranav Prakash* : : "temet nosce" : : : : On Wed, Sep 26, 2012 at 6:45 PM, Gora Mohanty <g...@mimirtech.com> wrote: : : > On 21 September 2012 11:19, Pranav Prakash <pra...@gmail.com> wrote: : > : > > I am seeing the garbage text in browser, Luke Index Toolbox and : > everywhere : > > it is the same. My servlet container is Jetty which is the out-of-box : > one. : > > Many other special chars are getting indexed and stored properly, only : > few : > > characters causes pain. : > > : > : > Could you double-check the encoding on the mysql side? : > What is the output of : > : > mysql> SHOW VARIABLES LIKE 'character\_set\_%'; : > : > Regards, : > Gora : > : -Hoss