Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

aerox7 Sat, 21 Mar 2009 00:06:18 -0700

Hi,
I've cheked MySql conf with "mysql> SHOW VARIABLES LIKE 'character_set%'; "
: all character_set are in UTF-8.


I think that dataimporter get data in ISO. so the i just write a custom
transformer to change the row's charset from iso to utf and now it work.

--> Noble Paul : I use SOLR 1.4 Nighty 2009-03-18 build. i have to download
the last one to apply your patch ?


Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> May be there is an issue with the recent changes with SOLR-973
> I have given a new patch on SOLR-973
> aerox ,is it possible to confirm if that is the problem
> 
> 
> On Fri, Mar 20, 2009 at 6:52 PM, Grant Ingersoll <gsing...@apache.org>
> wrote:
>> Usually, when I see characters like this, it means you aren't
>> viewing/handling the UTF-8 correctly when bringing it into Java.  I would
>> first check that your DB or JDBC driver is getting the chars out right.
>>  It
>> may even be the case that they did not go into the DB correctly in the
>> first
>> place.
>>
>> On Mar 20, 2009, at 4:36 AM, aerox7 wrote:
>>
>>>
>>> ==> where are you seeing it as ""SolÃ¨ne" as opposed to the
>>> correct way of solène?
>>>
>>> I have "SolÃ¨ne" in my Mysql DATA BASE ! so i don't know if this is
>>> correct
>>> or not ? i gess that "SolÃ¨ne" is solène in UTF-8 ?!
>>>
>>> I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so
>>> when i try with solène everything is ok ! but when i try with SolÃ¨ne
>>> (like
>>> what i have in DB) analysis convert Ã in A delete ¨ so i get SolAne !!!
>>>
>>> I think that ISOLatin1AccentFilterFactory take only string with Charset
>>> ISO-8859-1 .
>>>
>>> So any solution to transform my string to ISO-8859-1 before indexing
>>> process. May be by creating transformer in DataImportHandler ? (Never
>>> code
>>> in java :( )
>>>
>>> Thank you all.
>>>
>>>
>>> Koji Sekiguchi-2 wrote:
>>>>
>>>> aerox7 wrote:
>>>>>
>>>>> Hi,
>>>>> I have a mysql data base in UTF-8. I have a row with "SolÃ¨ne"
>>>>> (solène).
>>>>> I
>>>>> want to transforme this to solene, so i use Solr
>>>>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work
>>>>> ?!!
>>>>>
>>>>> i gess that "SolÃ¨ne" is "solène" in UTF-8 ?! i also set tomcat to
>>>>> utf-8
>>>>> so
>>>>> normaly ISOLatin1AccentFilterFactory have to replace the accent
>>>>> .......
>>>>>
>>>>> any ideas ?
>>>>>
>>>>> i use DataImportHandler.
>>>>>
>>>>
>>>> If a mapping rule "Ã¨" to "e" is always true in your field, you can try
>>>> to use MappingCharFilter
>>>> instead of ISOLatin1AccentFilter. Add the following line to
>>>> mapping-ISOLatin1Accent.txt:
>>>>
>>>> "Ã¨" => "e"
>>>>
>>>> and add the following fieldType:
>>>>
>>>> <fieldType name="textCharNorm" class="solr.TextField"
>>>> positionIncrementGap="100" >
>>>>  <analyzer>
>>>>   <charFilter class="solr.MappingCharFilterFactory"
>>>> mapping="mapping-ISOLatin1Accent.txt"/>
>>>>   <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>>>>  </analyzer>
>>>> </fieldType>
>>>>
>>>> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build.
>>>>
>>>> Koji
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22633422.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Reply via email to