: In general, I don't recommend indexing HTML content straight to Solr. None of : the Solr contributors do this so the use case hasn't received a lot of love.
I second that comment ... the HTML Striping code was never intended to be an "HTML Parser" it was designed to be a workarround for dealing with "dirty data" where people had unwanted HTML tags in what should be plain text. indexing as is with some analyzers would result in words like "script", "strong", and "class" matching lots of docs where the words never relaly appear in the text. if you have wellformed HTML documents, use an HTML parser to extract the real content. -Hoss