Hi Shawn/Erick,

This information has been very helpful. Thank you.

So I did some more investigation into our ETL process and I verified that
with the exception of the text I sent above they are all obviously invalid
dates. For example, one field value had 00 for a day so would guess that
field had a non-printable character in it. S at least in the case of a
record where a field has invalid date, the entire import process is
aborted. I'll adjust the ETL process to stop passing invalid dates but this
does lead me to question about failure modes for importing large data sets
into a collection. Is there any way to specify a "continue on failure" mode
such that solr logs that it was unable to parse a record and why and then
continues onto the next node?

Thanks,
Joe

On Sun, Feb 2, 2020 at 4:46 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/2/2020 8:47 AM, Joseph Lorenzini wrote:
> >          <autoSoftCommit>
> >              <maxTime>1000</maxTime>
> >              <maxDocs>10000</maxDocs>
> >          </autoSoftCommit>
>
> That autoSoftCommit setting is far too aggressive, especially for bulk
> indexing.  I don't know whether it's causing the specific problem you're
> asking about here, but it's still a setting that will cause problems,
> because Solr will constantly be doing commit operations while bulk
> indexing is underway.
>
> Erick mentioned this as well.  Greatly increasing the maxTime, and
> removing maxDocs, is recommended.  I would recommend starting at one
> minute.  The maxDocs setting should be removed from autoCommit as well.
>
> > So I turned off two solr nodes, leaving a single solr node up. When I ran
> > curl again, I noticed the import aborted with this exception.
> >
> > Error adding field 'primary_dob'='1983-12-21T00:00:00Z' msg=Invalid Date
> in
> > Date Math String:'1983-12-21T00:00:00Z
> > caused by: java.time.format.DateTimeParseException: Text
> > '1983-12-21T00:00:00Z' could not be parsed at index 0'
>
> That date string looks OK.  Which MIGHT mean there are characters in it
> that are not visible.  Erick said that the single quote is balanced in
> his message, which COULD mean that the character causing the problem is
> one that deletes things when it is printed.
>
> Thanks,
> Shawn
>

Reply via email to