I'm using DataImportHandler to send my data to Solr ! so you mean it possible to apply a transformer in db-config.xml with a perl script ?
Óscar Marín Miró wrote: > > Hi, > > My guess is that *although* your DB is in UTF-8, the database engine sends > you the rows in ISO-Latin1, so before doing *anything* after receiving the > data, you should transcode from ISO-Latin1 to UTF-8 and then send that to > SolR. I'm no Java expert, but in perl (MySQL DB in utf-8) I have to do > with > any row: > > $row=decode("iso-8859-1",$row); > > ... and before building the xml to invoque and add document to SolR: > > $row=encode("utf8",$row); > > On Fri, Mar 20, 2009 at 10:55 AM, aerox7 <amyne.berr...@me.com> wrote: > >> >> I add : >> "è" => "e" to mapping-ISOLatin1Accent.txt >> >> and add the following fieldType: >> >> <fieldType name="textCharNorm" class="solr.TextField" >> positionIncrementGap="100" > >> <analyzer> >> <charFilter class="solr.MappingCharFilterFactory" >> mapping="mapping-ISOLatin1Accent.txt"/> >> <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> >> </analyzer> >> </fieldType> >> >> By still have the same probleme ! it's only work when i store ISO string >> into UTF-8 data base (ex: store solène not solène)............ :,( >> >> >> >> >> aerox7 wrote: >> > >> > ==> where are you seeing it as ""Solène" as opposed to the >> > correct way of solène? >> > >> > I have "Solène" in my Mysql DATA BASE ! so i don't know if this is >> > correct or not ? i gess that "Solène" is solène in UTF-8 ?! >> > >> > I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, >> so >> > when i try with solène everything is ok ! but when i try with Solène >> > (like what i have in DB) analysis convert à in A delete ¨ so i get >> SolAne >> > !!! >> > >> > I think that ISOLatin1AccentFilterFactory take only string with Charset >> > ISO-8859-1 . >> > >> > So any solution to transform my string to ISO-8859-1 before indexing >> > process. May be by creating transformer in DataImportHandler ? (Never >> code >> > in java :( ) >> > >> > Thank you all. >> > >> > >> > Koji Sekiguchi-2 wrote: >> >> >> >> aerox7 wrote: >> >>> Hi, >> >>> I have a mysql data base in UTF-8. I have a row with "Solène" >> (solène). >> >>> I >> >>> want to transforme this to solene, so i use Solr >> >>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work >> ?!! >> >>> >> >>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to >> utf-8 >> >>> so >> >>> normaly ISOLatin1AccentFilterFactory have to replace the accent >> ....... >> >>> >> >>> any ideas ? >> >>> >> >>> i use DataImportHandler. >> >>> >> >> >> >> If a mapping rule "è" to "e" is always true in your field, you can >> try >> >> to use MappingCharFilter >> >> instead of ISOLatin1AccentFilter. Add the following line to >> >> mapping-ISOLatin1Accent.txt: >> >> >> >> "è" => "e" >> >> >> >> and add the following fieldType: >> >> >> >> <fieldType name="textCharNorm" class="solr.TextField" >> >> positionIncrementGap="100" > >> >> <analyzer> >> >> <charFilter class="solr.MappingCharFilterFactory" >> >> mapping="mapping-ISOLatin1Accent.txt"/> >> >> <tokenizer >> class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> >> >> </analyzer> >> >> </fieldType> >> >> >> >> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly >> build. >> >> >> >> Koji >> >> >> >> >> >> >> >> >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > -- > “I may not believe in myself, but I believe in what I'm doing.” > > -- Jimmy Page > > -- View this message in context: http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22618085.html Sent from the Solr - User mailing list archive at Nabble.com.