The limitations on how many threads you can use to load data is primarily 
driven by factors on your hardware:  CPU, heap usage, I/O, and the like.  It is 
common for most index load processes to be able to handle more incoming data on 
the Solr side of the equation than can typically be loaded from the source 
repository.  You'll have to explore a bit to find the limits, but if your 
hardware is sufficient you can likely load a great deal.

As for commits, they will indeed commit anything added to Solr regardless of 
the thread of the update.  Keep this in mind if you have a rollback concept in 
mind, or if you're measuring your incremental load to restart in case of 
error/failure.  Presuming you want more control, and If you are multi-threading 
index updates, it may be useful to have a delegate handle the commit process…or 
on a large data load, consider a commit at the end.  


On Oct 14, 2013, at 6:44 AM, maephisto <my_sky...@yahoo.com> wrote:

> Hi,
> 
> I have a collection (numShards=3, replicationFactor=2) split on 2 machines.
> Since the amount of data is huge I have to index, I would like start
> multiple instances of the same process that would index data to Solr.
> Is there any limitation or counter-indication is this area? 
> 
> The indexing client is custom built by me and parses files (each instance
> parses a different file), and the uniqueId is auto-generated. 
> Would a commit in a process also commit the uncommitted changes created by
> another process?
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to