I'am sorry I bother you again but this doesn't work, I have written this configuration in my schema.xml file:
<charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="Spanish"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> But it still doesn't convert the code to the correct character, for instance: Espa&ntilde;a must be converted to EspaƱa but it still remains as Espa&ntilde;a. I have included in this email an atachment with the results of the analysis.jsp application. Any help would be really appreciate it. Regards, Ariel On 6/16/11, Steven A Rowe <sar...@syr.edu> wrote: > Hi Ariel, > > As Shawn says, char filters come before tokenizers. > > You need to use a <charFilter> tag instead of <filter> tag. > > I've updated the HTMLStripCharFilter documentation on the Solr wiki to > include this information: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory > > Steve > >> -----Original Message----- >> From: Shawn Heisey [mailto:s...@elyograg.org] >> Sent: Thursday, June 16, 2011 1:32 PM >> To: solr-user@lucene.apache.org >> Subject: Re: How to index correctly a text save with tinyMCE >> >> On 6/16/2011 11:12 AM, Ariel wrote: >> > Thanks for your answer, I have just put the filter in my schema.xml but >> it >> > doesn't work I am using solr 1.4 and my conf is: >> > >> > <code> >> > <analyzer type="index"> >> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> > <filter class="solr.StopFilterFactory" ignoreCase="true" >> > words="stopwords.txt"/> >> > <filter class="solr.LowerCaseFilterFactory"/> >> > <filter class="solr.HTMLStripCharFilterFactory"/> >> > <filter class="solr.SnowballPorterFilterFactory" >> language="Spanish"/> >> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> > </analyzer> >> > </code> >> > >> > >> > But it doesn't work in tomcat 6 logs I get this error: >> > >> > java.lang.ClassCastException: >> > org.apache.solr.analysis.HTMLStripCharFilterFactory cannot be cast to >> > org.apache.solr.analysis.TokenFilterFactory >> >> According to the wiki, the output of that filter must be passed to >> either another CharFilter or a Tokenizer. Try moving it before >> WhitespaceTokenizerFactory. >> >> Shawn > >
analysis.rar
Description: application/rar