Hello, I am trying to use the solr.HTMLStripWhitespaceTokenizerFactory analyzer with no luck.
I have a field content that contains the following <field name="content"><![CDATA[test <a href="test">link</a> post]]></field> When I do a search I get the following <result name="response" numFound="1" start="0"> <doc> <str name="content">test <a href="test">link</a> post</str> <str name="id">po_1_NL</str> <str name="keywords">post</str> <str name="titlesearch">This is a test</str> </doc> </result> Is this normal? Shouldn't the html code and the white spaces be removed from the field? This is my config in schema.xml <fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/> </analyzer> </fieldType> <field name="content" type="text_ws" indexed="true" stored="true" omitNorms="false"/> Can someone help me with this?