Looking at it again, there appears to be only one HTML stripper. Your alternative is to use the regex PatternReplace stuff with some custom patterns. Ok make a stopword list of all html keywords.
On Thu, Jun 10, 2010 at 8:00 AM, Blargy <zman...@hotmail.com> wrote: > > Do I even need to tidy/clean up the html if I use the > HTMLStripCharFilterFactory? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p885797.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com