I'm using DataImportHandler to send my data to Solr ! so you mean it possible
to apply a transformer in db-config.xml with a perl script ?


Óscar Marín Miró wrote:
> 
> Hi,
> 
> My guess is that *although* your DB is in UTF-8, the database engine sends
> you the rows in ISO-Latin1, so before doing *anything* after receiving the
> data, you should transcode from ISO-Latin1 to UTF-8 and then send that to
> SolR. I'm no Java expert, but in perl (MySQL DB in utf-8) I have to do
> with
> any row:
> 
> $row=decode("iso-8859-1",$row);
> 
> ... and before building the xml to invoque and add document to SolR:
> 
> $row=encode("utf8",$row);
> 
> On Fri, Mar 20, 2009 at 10:55 AM, aerox7 <amyne.berr...@me.com> wrote:
> 
>>
>> I add :
>> "è" => "e" to mapping-ISOLatin1Accent.txt
>>
>> and add the following fieldType:
>>
>> <fieldType name="textCharNorm" class="solr.TextField"
>> positionIncrementGap="100" >
>>  <analyzer>
>>    <charFilter class="solr.MappingCharFilterFactory"
>> mapping="mapping-ISOLatin1Accent.txt"/>
>>    <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>>  </analyzer>
>> </fieldType>
>>
>> By still have the same probleme ! it's only work when i store ISO string
>> into UTF-8 data base (ex: store solène not solène)............ :,(
>>
>>
>>
>>
>> aerox7 wrote:
>> >
>> > ==> where are you seeing it as ""Solène" as opposed to the
>> > correct way of solène?
>> >
>> > I have "Solène" in my Mysql DATA BASE ! so i don't know if this is
>> > correct or not ? i gess that "Solène" is solène in UTF-8 ?!
>> >
>> > I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp,
>> so
>> > when i try with solène everything is ok ! but when i try with Solène
>> > (like what i have in DB) analysis convert à in A delete ¨ so i get
>> SolAne
>> > !!!
>> >
>> > I think that ISOLatin1AccentFilterFactory take only string with Charset
>> > ISO-8859-1 .
>> >
>> > So any solution to transform my string to ISO-8859-1 before indexing
>> > process. May be by creating transformer in DataImportHandler ? (Never
>> code
>> > in java :( )
>> >
>> > Thank you all.
>> >
>> >
>> > Koji Sekiguchi-2 wrote:
>> >>
>> >> aerox7 wrote:
>> >>> Hi,
>> >>> I have a mysql data base in UTF-8. I have a row with "Solène"
>> (solène).
>> >>> I
>> >>> want to transforme this to solene, so i use Solr
>> >>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work
>> ?!!
>> >>>
>> >>> i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to
>> utf-8
>> >>> so
>> >>> normaly ISOLatin1AccentFilterFactory have to replace the accent
>> .......
>> >>>
>> >>> any ideas ?
>> >>>
>> >>> i use DataImportHandler.
>> >>>
>> >>
>> >> If a mapping rule "è" to "e" is always true in your field, you can
>> try
>> >> to use MappingCharFilter
>> >> instead of ISOLatin1AccentFilter. Add the following line to
>> >> mapping-ISOLatin1Accent.txt:
>> >>
>> >> "è" => "e"
>> >>
>> >> and add the following fieldType:
>> >>
>> >> <fieldType name="textCharNorm" class="solr.TextField"
>> >> positionIncrementGap="100" >
>> >>   <analyzer>
>> >>     <charFilter class="solr.MappingCharFilterFactory"
>> >> mapping="mapping-ISOLatin1Accent.txt"/>
>> >>     <tokenizer
>> class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>> >>   </analyzer>
>> >> </fieldType>
>> >>
>> >> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly
>> build.
>> >>
>> >> Koji
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> “I may not believe in myself, but I believe in what I'm doing.”
> 
> -- Jimmy Page
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22618085.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to