I'm trying to use the html stripping factory in order to strip html tags from my description field when indexing.

I added this fieldtype:

   <fieldtype name="text_html" class="solr.TextField" 
positionIncrementGap="100">
     <analyzer type="index">
         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="1" catenateNumbers="1" catenateAll="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
         <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" 
expand="true"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
catenateWords="0" catenateNumbers="0" catenateAll="0"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldtype>


and then in my schema i have this:

<field name="description"            type="text_html"    indexed="true" 
stored="true"/>



when inserting it it seems like nothing happens ie when i do a query here is the response for a test description:

<str name="description">

<br>hi<br>my<br>name<br>is<br>topper<br>and this <b>&nbsp;blahblah</b> is a 
<b>test</b>

</str>




Any Ideas?

-Mike

Reply via email to