Hi Ariel,

As Shawn says, char filters come before tokenizers.

You need to use a <charFilter> tag instead of <filter> tag.

I've updated the HTMLStripCharFilter documentation on the Solr wiki to include 
this information: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

Steve

> -----Original Message-----
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Thursday, June 16, 2011 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to index correctly a text save with tinyMCE
> 
> On 6/16/2011 11:12 AM, Ariel wrote:
> > Thanks for your answer, I have just put the filter in my schema.xml but
> it
> > doesn't work I am using solr 1.4 and my conf is:
> >
> > <code>
> > <analyzer type="index">
> >      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> >      <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt"/>
> >      <filter class="solr.LowerCaseFilterFactory"/>
> >      <filter class="solr.HTMLStripCharFilterFactory"/>
> >      <filter class="solr.SnowballPorterFilterFactory"
> language="Spanish"/>
> >      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> >   </analyzer>
> > </code>
> >
> >
> > But it doesn't work in tomcat 6 logs I get this error:
> >
> >   java.lang.ClassCastException:
> > org.apache.solr.analysis.HTMLStripCharFilterFactory cannot be cast to
> > org.apache.solr.analysis.TokenFilterFactory
> 
> According to the wiki, the output of that filter must be passed to
> either another CharFilter or a Tokenizer.  Try moving it before
> WhitespaceTokenizerFactory.
> 
> Shawn

Reply via email to