Cool, glad I was able to help.
On Apr 3, 2013, at 4:18 PM, Ashok wrote:
> Hi Steve,
>
> Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice
> did the trick. I am using Solr 4.1.
>
> Thank you very much!
>
> - ashok
>
>
>
> --
> View this message in context:
> http:/
Hi Steve,
Fabulous suggestion! Yup, that is it! Using the HTMLStripTransformer twice
did the trick. I am using Solr 4.1.
Thank you very much!
- ashok
--
View this message in context:
http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582p4053609.h
Hi Ashok,
HTMLStripTransformer uses HTMLStripCharFilter under the hood, and
HTMLStripCharFilter converts all HTML entities to their corresponding
characters.
What version of Solr are you using?
My guess is that it only appears that nothing is happening, since when they are
presented in a brow
Then, I would say, you have a bigger problem
However, you can probably run RegEx filter and replace those known escapes
with real characters before you run your HTMLStrip filter. Or run,
HTMLStrip, RegEx and HTMLStrip again.
Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
Lin
Well, the database field has text, sometimes with HTML entities and at other
times with html tags. I have no control over the process that populates the
database tables with info.
--
View this message in context:
http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTra
On 4 April 2013 00:30, Ashok wrote:
[...]
> Two questions.
>
> (1) Is this the expected behavior of DIH HTMLStripTransformer?
Yes, I believe so.
> (2) If yes, is there an another transformer that I can employ first to turn
> these html entities into their usual symbols that can then be removed b