SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

Sebastian Riemer Fri, 18 Nov 2016 05:01:44 -0800

Hi all,

I am looking to improve indexing speed when loading many documents as part of 
an import. I am using the SolrJ-Client and currently I add the documents 
one-by-one using HttpSolrClient and  its method add(SolrInputDocument doc, int 
commitWithinMs).


My first step would be to change that to use add(Collection<SolrInputDocument> 
docs, int commitWithinMs) instead, which I expect would already improve 
performance.
Does it matter which method I use? Beside the method taking a 
Collection<SolrInputDocument> there is also one that takes an 
Iterator<SolrInputDocument> ... and what about ConcurrentUpdateSolrClient? 
Should I use it for bulk indexing instead of HttpSolrClient?

Currently we are on version 5.5.0 of solr, and we don't run SolrCloud, i.e. 
only one instance etc.
Indexing 39657 documents (which result in a core size of appr. 127MB) took 
about 10 minutes with the one-by-one approach.

Best regards and thanks for any suggestions,

Sebastian Riemer

SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

Reply via email to