On 6/9/2020 12:44 AM, Hup Chen wrote:
Thanks for your reply, this is one of the example where it fail.  POST by using  
charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in 
the title field,  I hope solr can simply skip this record and go ahead to index the rest 
data.

<add>
<doc>
  <field name="id">9780373773244</field>
  <field name="isbn13">9780373773244</field>
<field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) 
</field>
  <field name="author">Lisa_Jackson </field>
</doc>
</add>

curl 
"http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100";
 -H 'Content-Type: text/xml; charset=utf-8' -d @data


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
   <arr name="errors"/>
   <int name="maxErrors">100</int>
   <int name="status">400</int>
   <int name="QTime">0</int>
</lst>
<lst name="error">
   <lst name="metadata">
     <str name="error-class">org.apache.solr.common.SolrException</str>
     <str 
name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
   </lst>
   <str name="msg">Illegal character ((CTRL-CHAR, code 26))
  at [row,col {unknown-source}]: [1,225]</str>
   <int name="code">400</int>
</lst>
</response>

I tried your example XML as it is shown in your original message, saved to a file named "foo.xml", and didn't have any trouble. I wasn't even using the tolerant update processor. I just fired up the techproducts example on a solr-8.3.0 download I already had, added a field named "isbn13" (string type) so the schema was compatible, and tried the following command:

curl "http://localhost:8983/solr/techproducts/update"; -H 'Content-Type: text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced by an actual Ctrl-Z character. When I did that, I got exactly the same error you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML, which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual format of the input ... it only ignores errors during *indexing*. This error occurred during the input parsing, not during indexing, so the update processor could not ignore it.

Thanks,
Shawn

Reply via email to