Looking at it again, there appears to be only one HTML stripper. Your
alternative is to use the regex PatternReplace stuff with some custom
patterns. Ok make a stopword list of all html keywords.

On Thu, Jun 10, 2010 at 8:00 AM, Blargy <zman...@hotmail.com> wrote:
>
> Do I even need to tidy/clean up the html if I use the
> HTMLStripCharFilterFactory?
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Indexing-HTML-tp884497p885797.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to