On 5/21/2015 2:07 AM, Angel Todorov wrote:
> I'm crawling a file system folder and indexing 10 million docs, and I am
> adding them in batches of 5000, committing every 50 000 docs. The problem I
> am facing is that after each commit, the documents per sec that are indexed
> gets less and less.
> 
> If I do not commit at all, I can index those docs very quickly, and then I
> commit once at the end, but once i start indexing docs _after_ that (for
> example new files get added to the folder), indexing is also slowing down a
> lot.
> 
> Is it normal that the SOLR indexing speed depends on the number of
> documents that are _already_ indexed? I think it shouldn't matter if i
> start from scratch or I index a document in a core that already has a
> couple of million docs. Looks like SOLR is either doing something in a
> linear fashion, or there is some magic config parameter that I am not aware
> of.
> 
> I've read all perf docs, and I've tried changing mergeFactor,
> autowarmCounts, and the buffer sizes - to no avail.
> 
> I am using SOLR 5.1

Have you changed the heap size?  If you use the bin/solr script to start
it and don't change the heap size with the -m option or another method,
Solr 5.1 runs with a default size of 512MB, which is *very* small.

I bet you are running into problems with frequent and then ultimately
constant garbage collection, as Java attempts to free up enough memory
to allow the program to continue running.  If that is what is happening,
then eventually you will see an OutOfMemoryError exception.  The
solution is to increase the heap size.  I would probably start with at
least 4G for 10 million docs.

Thanks,
Shawn

Reply via email to