: It wasn't just a single file, it was dozens of files all having problems : toward the end just before I killed the process. ... : That is by no means all the errors, that is just a sample of a few. : You can see they all threw HTTP 500 errors. What is strange is, nearly : every file succeeded before about the 2200-files-mark, and nearly every : file after that failed.
..the root question is: do those files *only* fail if you have already indexed ~2200 files, or do they fail if you start up your server and index them first? there may be a resource issued (if it only happens after indexing 2200) or it may just be a problem with a large number of your PDFs that your iteration code just happens to get to at that point. If it's the former, then there may e something buggy about how Solr is using Tika to cause the problem -- if it's the later, then it's a straight Tika parsing issue. : > now, commit is set to false to speed up the indexing, and I'm assuming that : > Solr should be auto-committing as necessary. I'm using the default : > solrconfig.xml file included in apache-solr-1.4.1\example\solr\conf. Once solr does no autocommitting by default, you need to check your solrconfig.xml -Hoss