: It wasn't just a single file, it was dozens of files all having problems
: toward the end just before I killed the process.
...
: That is by no means all the errors, that is just a sample of a few.
: You can see they all threw HTTP 500 errors. What is strange is, nearly
: every file succeeded before about the 2200-files-mark, and nearly every
: file after that failed.
..the root question is: do those files *only* fail if you have already
indexed ~2200 files, or do they fail if you start up your server and index
them first?
there may be a resource issued (if it only happens after indexing 2200) or
it may just be a problem with a large number of your PDFs that your
iteration code just happens to get to at that point.
If it's the former, then there may e something buggy about how Solr is
using Tika to cause the problem -- if it's the later, then it's a straight
Tika parsing issue.
: > now, commit is set to false to speed up the indexing, and I'm assuming that
: > Solr should be auto-committing as necessary. I'm using the default
: > solrconfig.xml file included in apache-solr-1.4.1\example\solr\conf. Once
solr does no autocommitting by default, you need to check your
solrconfig.xml
-Hoss