lots of inserts very fast, out of heap or file descs

Brian Whitman Fri, 23 Feb 2007 15:02:43 -0800

I'm trying to add lots of documents at once (hundreds of thousands)in a loop. I don't need these docs to appear as results until I'mdone, though.

For a simple test, I call the post.sh script in a loop with the samemoderately sized xml file. This adds a 20K doc and then commits.Repeat hundreds of thousands of times.

This works fine for a while, but eventually (only 10K docs in or so)the Solr instance starts taking longer and longer to respond to my<add>s (I print out the curl time, near the end it takes 10s an add)and the web server (resin 3.0) eventually log dumps out with "out ofheap space" (my max heap is 1GB on a 4GB machine.)

I also see the "(Too many open files in system)" stacktrace comingfrom Lucene's SegmentReader during this test. My fs.file-max was361990, which bumped up to 2m, but I don't know how/why Solr/Lucenewould open that many.

My question is about best practices for this sort of "bulk add."Since insert time is not a concern, I have some leeway. Should Icommit after every add? Should I optimize every so many commits? Isthere some reaper on a thread or timer that I should let breathe?

lots of inserts very fast, out of heap or file descs

Reply via email to