; View this message in context:
> http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Hi Steve,
Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice
did the trick. I am using Solr 4.1.
Thank you very much!
- ashok
--
View this message in context:
http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609
Hi Ashok,
HTMLStripTransformer uses HTMLStripCharFilter under the hood, and
HTMLStripCharFilter converts all HTML entities to their corresponding
characters.
What version of Solr are you using?
My guess is that it only appears that nothing is happening, since when they are
presented in a brow
ometimes with HTML entities and at
> other
> times with html tags. I have no control over the process that populates the
> database tables with info.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStri
Well, the database field has text, sometimes with HTML entities and at other
times with html tags. I have no control over the process that populates the
database tables with info.
--
View this message in context:
http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH
On 4 April 2013 00:30, Ashok wrote:
[...]
> Two questions.
>
> (1) Is this the expected behavior of DIH HTMLStripTransformer?
Yes, I believe so.
> (2) If yes, is there an another transformer that I can employ first to turn
> these html entities into their usual symbols that can then be removed b
hat can then be removed by the
DIH HTMLStripTransformer?
Thanks
- ashok
--
View this message in context:
http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582.html
Sent from the Solr - User mailing list archive at Nabble.com.