Re: Fw: TolerantUpdateProcessorFactory not functioning

Shawn Heisey Tue, 09 Jun 2020 00:19:15 -0700

On 6/9/2020 12:44 AM, Hup Chen wrote:

Thanks for your reply, this is one of the example where it fail.  POST by using  
charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in 
the title field,  I hope solr can simply skip this record and go ahead to index the rest 
data.


<add>
<doc>
  <field name="id">9780373773244</field>
  <field name="isbn13">9780373773244</field>
<field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance) 
</field>
  <field name="author">Lisa_Jackson </field>
</doc>
</add>

curl 
"http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100";
 -H 'Content-Type: text/xml; charset=utf-8' -d @data


<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
   <arr name="errors"/>
   <int name="maxErrors">100</int>
   <int name="status">400</int>
   <int name="QTime">0</int>
</lst>
<lst name="error">
   <lst name="metadata">
     <str name="error-class">org.apache.solr.common.SolrException</str>
     <str 
name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
   </lst>
   <str name="msg">Illegal character ((CTRL-CHAR, code 26))
  at [row,col {unknown-source}]: [1,225]</str>
   <int name="code">400</int>
</lst>
</response>

I tried your example XML as it is shown in your original message, savedto a file named "foo.xml", and didn't have any trouble. I wasn't evenusing the tolerant update processor. I just fired up the techproductsexample on a solr-8.3.0 download I already had, added a field named"isbn13" (string type) so the schema was compatible, and tried thefollowing command:

curl "http://localhost:8983/solr/techproducts/update"; -H 'Content-Type:text/xml; charset=utf-8' -d @foo.xml

I then tried it again with the ^Z (which is two characters) replaced byan actual Ctrl-Z character. When I did that, I got exactly the sameerror you did.

A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,which is why you're getting the error.

The tolerant update processor can't ignore errors in the actual formatof the input ... it only ignores errors during *indexing*. This erroroccurred during the input parsing, not during indexing, so the updateprocessor could not ignore it.


Thanks,
Shawn

Re: Fw: TolerantUpdateProcessorFactory not functioning

Reply via email to