May be there is an issue with the recent changes with SOLR-973 I have given a new patch on SOLR-973 aerox ,is it possible to confirm if that is the problem
On Fri, Mar 20, 2009 at 6:52 PM, Grant Ingersoll <gsing...@apache.org> wrote: > Usually, when I see characters like this, it means you aren't > viewing/handling the UTF-8 correctly when bringing it into Java. I would > first check that your DB or JDBC driver is getting the chars out right. It > may even be the case that they did not go into the DB correctly in the first > place. > > On Mar 20, 2009, at 4:36 AM, aerox7 wrote: > >> >> ==> where are you seeing it as ""Solène" as opposed to the >> correct way of solène? >> >> I have "Solène" in my Mysql DATA BASE ! so i don't know if this is >> correct >> or not ? i gess that "Solène" is solène in UTF-8 ?! >> >> I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so >> when i try with solène everything is ok ! but when i try with Solène >> (like >> what i have in DB) analysis convert à in A delete ¨ so i get SolAne !!! >> >> I think that ISOLatin1AccentFilterFactory take only string with Charset >> ISO-8859-1 . >> >> So any solution to transform my string to ISO-8859-1 before indexing >> process. May be by creating transformer in DataImportHandler ? (Never code >> in java :( ) >> >> Thank you all. >> >> >> Koji Sekiguchi-2 wrote: >>> >>> aerox7 wrote: >>>> >>>> Hi, >>>> I have a mysql data base in UTF-8. I have a row with "Solène" (solène). >>>> I >>>> want to transforme this to solene, so i use Solr >>>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work ?!! >>>> >>>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to utf-8 >>>> so >>>> normaly ISOLatin1AccentFilterFactory have to replace the accent ....... >>>> >>>> any ideas ? >>>> >>>> i use DataImportHandler. >>>> >>> >>> If a mapping rule "è" to "e" is always true in your field, you can try >>> to use MappingCharFilter >>> instead of ISOLatin1AccentFilter. Add the following line to >>> mapping-ISOLatin1Accent.txt: >>> >>> "è" => "e" >>> >>> and add the following fieldType: >>> >>> <fieldType name="textCharNorm" class="solr.TextField" >>> positionIncrementGap="100" > >>> <analyzer> >>> <charFilter class="solr.MappingCharFilterFactory" >>> mapping="mapping-ISOLatin1Accent.txt"/> >>> <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> >>> </analyzer> >>> </fieldType> >>> >>> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build. >>> >>> Koji >>> >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- --Noble Paul