Hi, I want users to add content to my site using tinyMCE, which generates HTML. When I tried adding the data to Solr, Solr refused to add it (or at least generated an error):
SEVERE: org.xmlpull.v1.XmlPullParserException: parser must be on START_TAG or TEXT to read text (position: START_TAG seen ...<field name="text"><p>... @4:39) at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1071) at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:910) at org.apache.solr.core.SolrCore.update(SolrCore.java:685) at org.apache.solr.servlet.SolrUpdateServlet.doPost( SolrUpdateServlet.java:52) at javax.servlet.http.HttpServlet.service(HttpServlet.java:709) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter( ApplicationFilterChain.java:252) at org.apache.catalina.core.ApplicationFilterChain.doFilter( ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke( StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke( StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke( StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke( ErrorReportValve.java:105) at org.apache.catalina.valves.RequestFilterValve.process( RequestFilterValve.java:275) at org.apache.catalina.valves.RemoteAddrValve.invoke( RemoteAddrValve.java:80) at org.apache.catalina.core.StandardEngineValve.invoke( StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service( CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java :869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection (Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket( PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt( LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run( ThreadPool.java:684) at java.lang.Thread.run(Thread.java:595) So I searched the archives to resolve this issue, since I didn't want to strip out the HTML entirely. The solution proved to be to add <![CDATA[ around the HTML text, like so: <add><doc> <field name="text"><![CDATA[#{field.text}]]></field> </add></doc> This also drew my attention to another problem, characters like < > & are all 'invalid' characters between xml tags. So that would mean, I have to put <![CDATA[ around all the fields I want to index!? Because I don't know or cann't control what my users will input. Is this the only solution or is their a way for Solr to handle these 'invalid' characters in the indexed text by itself, without generating errors? Kind regards, Nick