I don’t quite know how TolerantUpdateProcessor works with importing CSV files, see: https://issues.apache.org/jira/browse/SOLR-445. That is about sending batches of docs to Solr and frankly I don’t know what path your process will take. It’s worth a try though.
Otherwise, I typically go with SolrJ and send batches. That does combine with TolerantUpdateProcessor. Best, Erick > On Feb 3, 2020, at 10:16 AM, Joseph Lorenzini <jalo...@gmail.com> wrote: > > Hi Shawn/Erick, > > This information has been very helpful. Thank you. > > So I did some more investigation into our ETL process and I verified that > with the exception of the text I sent above they are all obviously invalid > dates. For example, one field value had 00 for a day so would guess that > field had a non-printable character in it. S at least in the case of a > record where a field has invalid date, the entire import process is > aborted. I'll adjust the ETL process to stop passing invalid dates but this > does lead me to question about failure modes for importing large data sets > into a collection. Is there any way to specify a "continue on failure" mode > such that solr logs that it was unable to parse a record and why and then > continues onto the next node? > > Thanks, > Joe > > On Sun, Feb 2, 2020 at 4:46 PM Shawn Heisey <apa...@elyograg.org> wrote: > >> On 2/2/2020 8:47 AM, Joseph Lorenzini wrote: >>> <autoSoftCommit> >>> <maxTime>1000</maxTime> >>> <maxDocs>10000</maxDocs> >>> </autoSoftCommit> >> >> That autoSoftCommit setting is far too aggressive, especially for bulk >> indexing. I don't know whether it's causing the specific problem you're >> asking about here, but it's still a setting that will cause problems, >> because Solr will constantly be doing commit operations while bulk >> indexing is underway. >> >> Erick mentioned this as well. Greatly increasing the maxTime, and >> removing maxDocs, is recommended. I would recommend starting at one >> minute. The maxDocs setting should be removed from autoCommit as well. >> >>> So I turned off two solr nodes, leaving a single solr node up. When I ran >>> curl again, I noticed the import aborted with this exception. >>> >>> Error adding field 'primary_dob'='1983-12-21T00:00:00Z' msg=Invalid Date >> in >>> Date Math String:'1983-12-21T00:00:00Z >>> caused by: java.time.format.DateTimeParseException: Text >>> '1983-12-21T00:00:00Z' could not be parsed at index 0' >> >> That date string looks OK. Which MIGHT mean there are characters in it >> that are not visible. Erick said that the single quote is balanced in >> his message, which COULD mean that the character causing the problem is >> one that deletes things when it is printed. >> >> Thanks, >> Shawn >>