When I index an HTML page the attr_content field shows "//<![CDATA..." stuff (part of a <script> tag in the original HTML page). I'm sure the the problem is with my solrconfig.xml. Here is the section I think I'm looking to adjust.
<requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <str name="lowernames">true</str> <str name="fmap.meta">ignored_</str> <str name="fmap.content">_text_</str> <str name="fmap.script">ignored_</str> <!-- my change --> <str name="captureAttr">true</str> <!-- my change --> </lst> </requestHandler> The reference manual also mentions <script> and CDATA in connection with HTMLStripCharFilterFactory but I the page does not explain how to apply it in the configuratoin file. Google led me to a class called "HTMLStripReader". I think I'd like to apply that to the "attr_content" field, but i don't know how to apply it either. Version: Solr 5.5 (previous version also behaves the same) Any insights would be appreciated.