> Yes, I asked the wrong question. What I was subconsciously
> getting at is
> this: how are you avoiding the possibility of getting hits
> in the HTML
> elements? Is that accomplished by putting tag names in your
> stopwords, or
> by some other mechanism?

HtmlStripCharFilter removes html tags. After it only textual content remains. 
It is the same as extracting text from html/xml. 

admin/analysis.jsp is great tool visualizing analysis chain. You can try it.

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

Reply via email to