> Yes, I asked the wrong question. What I was subconsciously > getting at is > this: how are you avoiding the possibility of getting hits > in the HTML > elements? Is that accomplished by putting tag names in your > stopwords, or > by some other mechanism?
HtmlStripCharFilter removes html tags. After it only textual content remains. It is the same as extracting text from html/xml. admin/analysis.jsp is great tool visualizing analysis chain. You can try it. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory