I guess you can mention a JdbcDataSource property characterEncoding="UTF8" and it should help
On Sat, Mar 21, 2009 at 10:58 AM, aerox7 <amyne.berr...@me.com> wrote: > > Hi, > I've cheked MySql conf with "mysql> SHOW VARIABLES LIKE 'character_set%'; " > : all character_set are in UTF-8. > > I think that dataimporter get data in ISO. so the i just write a custom > transformer to change the row's charset from iso to utf and now it work. > > --> Noble Paul : I use SOLR 1.4 Nighty 2009-03-18 build. i have to download > the last one to apply your patch ? > > > Noble Paul നോബിള് नोब्ळ् wrote: >> >> May be there is an issue with the recent changes with SOLR-973 >> I have given a new patch on SOLR-973 >> aerox ,is it possible to confirm if that is the problem >> >> >> On Fri, Mar 20, 2009 at 6:52 PM, Grant Ingersoll <gsing...@apache.org> >> wrote: >>> Usually, when I see characters like this, it means you aren't >>> viewing/handling the UTF-8 correctly when bringing it into Java. I would >>> first check that your DB or JDBC driver is getting the chars out right. >>> It >>> may even be the case that they did not go into the DB correctly in the >>> first >>> place. >>> >>> On Mar 20, 2009, at 4:36 AM, aerox7 wrote: >>> >>>> >>>> ==> where are you seeing it as ""Solène" as opposed to the >>>> correct way of solène? >>>> >>>> I have "Solène" in my Mysql DATA BASE ! so i don't know if this is >>>> correct >>>> or not ? i gess that "Solène" is solène in UTF-8 ?! >>>> >>>> I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so >>>> when i try with solène everything is ok ! but when i try with Solène >>>> (like >>>> what i have in DB) analysis convert à in A delete ¨ so i get SolAne !!! >>>> >>>> I think that ISOLatin1AccentFilterFactory take only string with Charset >>>> ISO-8859-1 . >>>> >>>> So any solution to transform my string to ISO-8859-1 before indexing >>>> process. May be by creating transformer in DataImportHandler ? (Never >>>> code >>>> in java :( ) >>>> >>>> Thank you all. >>>> >>>> >>>> Koji Sekiguchi-2 wrote: >>>>> >>>>> aerox7 wrote: >>>>>> >>>>>> Hi, >>>>>> I have a mysql data base in UTF-8. I have a row with "Solène" >>>>>> (solène). >>>>>> I >>>>>> want to transforme this to solene, so i use Solr >>>>>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work >>>>>> ?!! >>>>>> >>>>>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to >>>>>> utf-8 >>>>>> so >>>>>> normaly ISOLatin1AccentFilterFactory have to replace the accent >>>>>> ....... >>>>>> >>>>>> any ideas ? >>>>>> >>>>>> i use DataImportHandler. >>>>>> >>>>> >>>>> If a mapping rule "è" to "e" is always true in your field, you can try >>>>> to use MappingCharFilter >>>>> instead of ISOLatin1AccentFilter. Add the following line to >>>>> mapping-ISOLatin1Accent.txt: >>>>> >>>>> "è" => "e" >>>>> >>>>> and add the following fieldType: >>>>> >>>>> <fieldType name="textCharNorm" class="solr.TextField" >>>>> positionIncrementGap="100" > >>>>> <analyzer> >>>>> <charFilter class="solr.MappingCharFilterFactory" >>>>> mapping="mapping-ISOLatin1Accent.txt"/> >>>>> <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> >>>>> </analyzer> >>>>> </fieldType> >>>>> >>>>> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build. >>>>> >>>>> Koji >>>>> >>>>> >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html >>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>> >>> >>> >> >> >> >> -- >> --Noble Paul >> >> > > -- > View this message in context: > http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22633051.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- --Noble Paul