Out use case is we have 3 indexing machines pulling off a kafka queue and they are all sending individual updates.
On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller <markrmil...@gmail.com> wrote: > Just make sure parallel updates is set to true. > > If you want to load even faster, you can use the bulk add methods, or if > you need more fine grained responses, use the single add from multiple > threads (though bulk add can also be done via multiple threads if you > really want to try and push the max). > > - Mark > > http://about.me/markrmiller > > On Jan 31, 2014, at 3:50 PM, Software Dev <static.void....@gmail.com> > wrote: > > > Which of any of these settings would be beneficial when bulk uploading? > > > > > > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller <markrmil...@gmail.com> > wrote: > > > >> > >> > >> On Jan 31, 2014, at 1:56 PM, Greg Walters <greg.walt...@answers.com> > >> wrote: > >> > >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore > >> my response. > >>> > >>>> -updatesToLeaders > >>> > >>> Only send documents to shard leaders while indexing. This saves > >> cross-talk between slaves and leaders which results in more efficient > >> document routing. > >> > >> Right, but recently this has less of an affect because CloudSolrServer > can > >> now hash documents and directly send them to the right place. This > option > >> has become more historical. Just make sure you set the correct id field > on > >> the CloudSolrServer instance for this hashing to work (I think it > defaults > >> to "id"). > >> > >>> > >>>> shutdownLBHttpSolrServer > >>> > >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute > >> requests (that aren't updates directly to leaders). Where did you find > >> this? I don't see this in the javadoc anywhere but it is a boolean in > the > >> CloudSolrServer class. It looks like when you create a new > CloudSolrServer > >> and pass it your own LBHttpSolrServer the boolean gets set to false and > the > >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut > down. > >>> > >>>> parellelUpdates > >>> > >>> The javadoc's done have any description for this one but I checked out > >> the code for CloudSolrServer and if parallelUpdates it looks like it > >> executes update statements to multiple shards at the same time. > >> > >> Right, we should def add some javadoc, but this sends updates to shards > in > >> parallel rather than with a single thread. Can really increase update > >> speed. Still not as powerful as using CloudSolrServer from multiple > >> threads, but a nice improvement non the less. > >> > >> > >> - Mark > >> > >> http://about.me/markrmiller > >> > >>> > >>> I'm no dev but I can read so please excuse any errors on my part. > >>> > >>> Thanks, > >>> Greg > >>> > >>> On Jan 31, 2014, at 11:40 AM, Software Dev <static.void....@gmail.com> > >> wrote: > >>> > >>>> Can someone clarify what the following options are: > >>>> > >>>> - updatesToLeaders > >>>> - shutdownLBHttpSolrServer > >>>> - parallelUpdates > >>>> > >>>> Also, I remember in older version of Solr there was an efficient > format > >>>> that was used between SolrJ and Solr that is more compact. Does this > >> sill > >>>> exist in the latest version of Solr? If so, is it the default? > >>>> > >>>> Thanks > >>> > >> > >> > >