Solr Newbie question: doubts about html content
My "current" problem is to know the best approach to handle content which have html code. I have some docs that may or may not have html tag. My first attempt, I defined a field "text" in my schema.xml : <field name="text" type="text" indexed="true" stored="true"/> <field name="texto"> <br><p> A Brasil Telecom … <br/><br/><br/>]]></field> But some docs that have html code throw an error when I tried to send them to Solr. My second attempt, I put "<![CDATA[<br><p> A Brasil Telecom … <br/><br/><br/>]]>" and I could send the docs to Solr, and, I could make a search for "<br>" and retrieve the doc. But consulting the result page source, as you can see, <str name="text"> <br><p> A Brasil Telecom ... </str> the html code was "changed". My third approach is to create 2 fields in my schema: . One with original content . One with no html code, which will be indexed. But I don't know how to preserve this html content in my new field. My question is: How to put these docs in Solr, search them, and retrieve de original <html> content. Thanks for attention. BR, Marcio