Hi, I've cheked MySql conf with "mysql> SHOW VARIABLES LIKE 'character_set%'; " : all character_set are in UTF-8.
I think that dataimporter get data in ISO. so the i just write a custom transformer to change the row's charset from iso to utf and now it work. --> Noble Paul : I use SOLR 1.4 Nighty 2009-03-18 build. i have to download the last one to apply your patch ? Noble Paul നോബിള് नोब्ळ् wrote: > > May be there is an issue with the recent changes with SOLR-973 > I have given a new patch on SOLR-973 > aerox ,is it possible to confirm if that is the problem > > > On Fri, Mar 20, 2009 at 6:52 PM, Grant Ingersoll <gsing...@apache.org> > wrote: >> Usually, when I see characters like this, it means you aren't >> viewing/handling the UTF-8 correctly when bringing it into Java. I would >> first check that your DB or JDBC driver is getting the chars out right. >> It >> may even be the case that they did not go into the DB correctly in the >> first >> place. >> >> On Mar 20, 2009, at 4:36 AM, aerox7 wrote: >> >>> >>> ==> where are you seeing it as ""Solène" as opposed to the >>> correct way of solène? >>> >>> I have "Solène" in my Mysql DATA BASE ! so i don't know if this is >>> correct >>> or not ? i gess that "Solène" is solène in UTF-8 ?! >>> >>> I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so >>> when i try with solène everything is ok ! but when i try with Solène >>> (like >>> what i have in DB) analysis convert à in A delete ¨ so i get SolAne !!! >>> >>> I think that ISOLatin1AccentFilterFactory take only string with Charset >>> ISO-8859-1 . >>> >>> So any solution to transform my string to ISO-8859-1 before indexing >>> process. May be by creating transformer in DataImportHandler ? (Never >>> code >>> in java :( ) >>> >>> Thank you all. >>> >>> >>> Koji Sekiguchi-2 wrote: >>>> >>>> aerox7 wrote: >>>>> >>>>> Hi, >>>>> I have a mysql data base in UTF-8. I have a row with "Solène" >>>>> (solène). >>>>> I >>>>> want to transforme this to solene, so i use Solr >>>>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work >>>>> ?!! >>>>> >>>>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to >>>>> utf-8 >>>>> so >>>>> normaly ISOLatin1AccentFilterFactory have to replace the accent >>>>> ....... >>>>> >>>>> any ideas ? >>>>> >>>>> i use DataImportHandler. >>>>> >>>> >>>> If a mapping rule "è" to "e" is always true in your field, you can try >>>> to use MappingCharFilter >>>> instead of ISOLatin1AccentFilter. Add the following line to >>>> mapping-ISOLatin1Accent.txt: >>>> >>>> "è" => "e" >>>> >>>> and add the following fieldType: >>>> >>>> <fieldType name="textCharNorm" class="solr.TextField" >>>> positionIncrementGap="100" > >>>> <analyzer> >>>> <charFilter class="solr.MappingCharFilterFactory" >>>> mapping="mapping-ISOLatin1Accent.txt"/> >>>> <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> >>>> </analyzer> >>>> </fieldType> >>>> >>>> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build. >>>> >>>> Koji >>>> >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> >> > > > > -- > --Noble Paul > > -- View this message in context: http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22633051.html Sent from the Solr - User mailing list archive at Nabble.com.