: I created a field type: : : <fieldType name="htmlTxt" class="solr.TextField" positionIncrementGap="100">
... : Everything works (the div tags, p tags are removed) but some : <strong>nnn</strong> or <br/> tags are style in the text after indexing. i cut/paste that fieldtype into the example schema.xml, and experimented with the analysis tool (http://localhost:8983/solr/admin/analysis.jsp) and both of those examples were correctly striped. do you have a more specific example of something that doesn't work? Hmm... it seems like maybe the problem is examples like this... blahblah<string>nnn</strong> ...if the tag is direclty adjacent to other text, it may not get striped off ... i'm not sure if that's specific to the HtmlWhitespaceTokenizer. -Hoss