I found a bug in the HTML Standard Strip filter where it doesn't place word
boundaries at html tags that should be ends of blocks.

I've just discovered that if I index some text like this:

<h2>title</h2><p>some text</p>

it is stripped and indexed as "titlesome" and "text". Putting a space or
newline between the tags fixes the problem, but I'm often seeing html like
this being generated by our CMS system, so I don't always have easy control
of this.

Where do I file a bug report?

-Matt
-- 
View this message in context: 
http://www.nabble.com/HTML-Standard-Strip-filter-word-boundary-bug-tp18865749p18865749.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to