Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Noble Paul നോബിള്‍ नोब्ळ् Fri, 20 Mar 2009 10:38:01 -0700

May be there is an issue with the recent changes with SOLR-973
I have given a new patch on SOLR-973
aerox ,is it possible to confirm if that is the problem



On Fri, Mar 20, 2009 at 6:52 PM, Grant Ingersoll <gsing...@apache.org> wrote:
> Usually, when I see characters like this, it means you aren't
> viewing/handling the UTF-8 correctly when bringing it into Java.  I would
> first check that your DB or JDBC driver is getting the chars out right.  It
> may even be the case that they did not go into the DB correctly in the first
> place.
>
> On Mar 20, 2009, at 4:36 AM, aerox7 wrote:
>
>>
>> ==> where are you seeing it as ""SolÃ¨ne" as opposed to the
>> correct way of solène?
>>
>> I have "SolÃ¨ne" in my Mysql DATA BASE ! so i don't know if this is
>> correct
>> or not ? i gess that "SolÃ¨ne" is solène in UTF-8 ?!
>>
>> I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so
>> when i try with solène everything is ok ! but when i try with SolÃ¨ne
>> (like
>> what i have in DB) analysis convert Ã in A delete ¨ so i get SolAne !!!
>>
>> I think that ISOLatin1AccentFilterFactory take only string with Charset
>> ISO-8859-1 .
>>
>> So any solution to transform my string to ISO-8859-1 before indexing
>> process. May be by creating transformer in DataImportHandler ? (Never code
>> in java :( )
>>
>> Thank you all.
>>
>>
>> Koji Sekiguchi-2 wrote:
>>>
>>> aerox7 wrote:
>>>>
>>>> Hi,
>>>> I have a mysql data base in UTF-8. I have a row with "SolÃ¨ne" (solène).
>>>> I
>>>> want to transforme this to solene, so i use Solr
>>>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work ?!!
>>>>
>>>> i gess that "SolÃ¨ne" is "solène" in UTF-8 ?! i also set tomcat to utf-8
>>>> so
>>>> normaly ISOLatin1AccentFilterFactory have to replace the accent .......
>>>>
>>>> any ideas ?
>>>>
>>>> i use DataImportHandler.
>>>>
>>>
>>> If a mapping rule "Ã¨" to "e" is always true in your field, you can try
>>> to use MappingCharFilter
>>> instead of ISOLatin1AccentFilter. Add the following line to
>>> mapping-ISOLatin1Accent.txt:
>>>
>>> "Ã¨" => "e"
>>>
>>> and add the following fieldType:
>>>
>>> <fieldType name="textCharNorm" class="solr.TextField"
>>> positionIncrementGap="100" >
>>>  <analyzer>
>>>   <charFilter class="solr.MappingCharFilterFactory"
>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>>   <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>>>  </analyzer>
>>> </fieldType>
>>>
>>> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build.
>>>
>>> Koji
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>



-- 
--Noble Paul

Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Reply via email to