On 6/9/2020 12:44 AM, Hup Chen wrote:
Thanks for your reply, this is one of the example where it fail. POST by using
charset=utf-8 or other charset didn't help that CTRL-CHAR "^" error found in
the title field, I hope solr can simply skip this record and go ahead to index the rest
data.
<add>
<doc>
<field name="id">9780373773244</field>
<field name="isbn13">9780373773244</field>
<field name="title">Missing: Innocent By Association^Zachary's Law (Hqn Romance)
</field>
<field name="author">Lisa_Jackson </field>
</doc>
</add>
curl
"http://localhost:7070/solr/searchinfo/update?update.chain=tolerant-chain&maxErrors=100"
-H 'Content-Type: text/xml; charset=utf-8' -d @data
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<arr name="errors"/>
<int name="maxErrors">100</int>
<int name="status">400</int>
<int name="QTime">0</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str
name="root-error-class">com.ctc.wstx.exc.WstxUnexpectedCharException</str>
</lst>
<str name="msg">Illegal character ((CTRL-CHAR, code 26))
at [row,col {unknown-source}]: [1,225]</str>
<int name="code">400</int>
</lst>
</response>
I tried your example XML as it is shown in your original message, saved
to a file named "foo.xml", and didn't have any trouble. I wasn't even
using the tolerant update processor. I just fired up the techproducts
example on a solr-8.3.0 download I already had, added a field named
"isbn13" (string type) so the schema was compatible, and tried the
following command:
curl "http://localhost:8983/solr/techproducts/update" -H 'Content-Type:
text/xml; charset=utf-8' -d @foo.xml
I then tried it again with the ^Z (which is two characters) replaced by
an actual Ctrl-Z character. When I did that, I got exactly the same
error you did.
A Ctrl-Z character (ascii code 26) is *NOT* a valid character for XML,
which is why you're getting the error.
The tolerant update processor can't ignore errors in the actual format
of the input ... it only ignores errors during *indexing*. This error
occurred during the input parsing, not during indexing, so the update
processor could not ignore it.
Thanks,
Shawn