Hi Ashok, HTMLStripTransformer uses HTMLStripCharFilter under the hood, and HTMLStripCharFilter converts all HTML entities to their corresponding characters.
What version of Solr are you using? My guess is that it only appears that nothing is happening, since when they are presented in a browser, they show up as the characters the entities represent. I think (never done this myself) that if you apply the HTMLStripTransformer twice, it will first convert the entities to characters, and then on the second pass, remove the HTML constructs. From <http://wiki.apache.org/solr/DataImportHandler#Transformer>: ----- The entity transformer attribute can consist of a comma separated list of transformers (say transformer="foo.X,foo.Y"). The transformers are chained in this case and they are applied one after the other in the order in which they are specified. What this means is that after the fields are fetched from the datasource, the list of entity columns are processed one at a time in the order listed inside the entity tag and scanned by the first transformer to see if any of that transformers attributes are present. If so the transformer does it's thing! When all of the listed entity columns have been scanned the process is repeated using the next transformer in the list. ----- Steve On Apr 3, 2013, at 3:30 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Then, I would say, you have a bigger problem.... > > However, you can probably run RegEx filter and replace those known escapes > with real characters before you run your HTMLStrip filter. Or run, > HTMLStrip, RegEx and HTMLStrip again. > > Regards, > Alex. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Wed, Apr 3, 2013 at 3:19 PM, Ashok <ash...@qualcomm.com> wrote: > >> Well, the database field has text, sometimes with HTML entities and at >> other >> times with html tags. I have no control over the process that populates the >> database tables with info.