Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
; View this message in context: > http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609.html > Sent from the Solr - User mailing list archive at Nabble.com.

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Hi Steve, Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice did the trick. I am using Solr 4.1. Thank you very much! - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Steve Rowe
Hi Ashok, HTMLStripTransformer uses HTMLStripCharFilter under the hood, and HTMLStripCharFilter converts all HTML entities to their corresponding characters. What version of Solr are you using? My guess is that it only appears that nothing is happening, since when they are presented in a brow

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Alexandre Rafalovitch
ometimes with HTML entities and at > other > times with html tags. I have no control over the process that populates the > database tables with info. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStri

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
Well, the database field has text, sometimes with HTML entities and at other times with html tags. I have no control over the process that populates the database tables with info. -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH

Re: HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Gora Mohanty
On 4 April 2013 00:30, Ashok wrote: [...] > Two questions. > > (1) Is this the expected behavior of DIH HTMLStripTransformer? Yes, I believe so. > (2) If yes, is there an another transformer that I can employ first to turn > these html entities into their usual symbols that can then be removed b

HTML entities being missed by DIH HTMLStripTransformer

2013-04-03 Thread Ashok
hat can then be removed by the DIH HTMLStripTransformer? Thanks - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582.html Sent from the Solr - User mailing list archive at Nabble.com.