Hi, I am using DIH to index some database fields. These fields contain html formatted text in them. I use the 'HTMLStripTransformer' to remove that markup. This works fine when the text is like for example:
<li>Item One</li> or *This is in Bold* However when the text has HTML entity names like in: <li>Item One</> or <b>This is in Bold</b> NOTHING HAPPENS. Two questions. (1) Is this the expected behavior of DIH HTMLStripTransformer? (2) If yes, is there an another transformer that I can employ first to turn these html entities into their usual symbols that can then be removed by the DIH HTMLStripTransformer? Thanks - ashok -- View this message in context: http://lucene.472066.n3.nabble.com/HTML-entities-being-missed-by-DIH-HTMLStripTransformer-tp4053582.html Sent from the Solr - User mailing list archive at Nabble.com.