Usually, when I see characters like this, it means you aren't viewing/
handling the UTF-8 correctly when bringing it into Java. I would
first check that your DB or JDBC driver is getting the chars out
right. It may even be the case that they did not go into the DB
correctly in the first place.
On Mar 20, 2009, at 4:36 AM, aerox7 wrote:
==> where are you seeing it as ""Solène" as opposed to the
correct way of solène?
I have "Solène" in my Mysql DATA BASE ! so i don't know if this is
correct
or not ? i gess that "Solène" is solène in UTF-8 ?!
I'vz tryed analysis in http://localhost:8983/solr/admin/
analysis.jsp, so
when i try with solène everything is ok ! but when i try with
Solène (like
what i have in DB) analysis convert à in A delete ¨ so i get
SolAne !!!
I think that ISOLatin1AccentFilterFactory take only string with
Charset
ISO-8859-1 .
So any solution to transform my string to ISO-8859-1 before indexing
process. May be by creating transformer in DataImportHandler ?
(Never code
in java :( )
Thank you all.
Koji Sekiguchi-2 wrote:
aerox7 wrote:
Hi,
I have a mysql data base in UTF-8. I have a row with
"Solène" (solène).
I
want to transforme this to solene, so i use Solr
ISOLatin1AccentFilterFactory to perform this task but it dosn't
work ?!!
i gess that "Solène" is "solène" in UTF-8 ?! i also set tomcat to
utf-8
so
normaly ISOLatin1AccentFilterFactory have to replace the
accent .......
any ideas ?
i use DataImportHandler.
If a mapping rule "è" to "e" is always true in your field, you can
try
to use MappingCharFilter
instead of ISOLatin1AccentFilter. Add the following line to
mapping-ISOLatin1Accent.txt:
"è" => "e"
and add the following fieldType:
<fieldType name="textCharNorm" class="solr.TextField"
positionIncrementGap="100" >
<analyzer>
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer
class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly
build.
Koji
--
View this message in context:
http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22616220.html
Sent from the Solr - User mailing list archive at Nabble.com.