Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-14 Thread Shivaji Dutta
Thanks Erick. On 1/13/16, 10:55 AM, "Erick Erickson" wrote: >My first thought is "yes, you're overthinking it" ;) > >Here's something to get you started for indexing >through a Java program: >https://cwiki.apache.org/confluence/display/solr/Using+SolrJ > >Of course you _could_ use Lucene to

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Erick Erickson
My first thought is "yes, you're overthinking it" ;) Here's something to get you started for indexing through a Java program: https://cwiki.apache.org/confluence/display/solr/Using+SolrJ Of course you _could_ use Lucene to build your indexes and just copy them "to the right place", but there

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Toke Eskildsen
Shivaji Dutta wrote: > If I have a repository of millions of documents, would it not make sense > to just index them locally and then copy the index file over to Solr and > have it read from it? It is certainly possible and for some scenarios it will work well. We do it locally: Create a shard,

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Shivaji Dutta
Erik and Shawn Thanks for the input. In the process below we are posting the documents to Solr over HTTP Connection in batches. Trying to solve the same problem but in a different way :- I have used lucene back in the day, where I would index the documents locally on the disk and run search quer

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-13 Thread Erick Erickson
It's usually not all that difficult to write a multi-threaded client that uses CloudSolrClient, or even fire up multiple instances of the SolrJ client (assuming they can work on discreet sections of the documents you need to index). That avoids the problem Shawn alludes to. Plus other issues. If y

Re: ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-12 Thread Shawn Heisey
On 1/12/2016 7:42 PM, Shivaji Dutta wrote: > Now since with ConcurrentUdateSolrClient I am able to use a queue and a pool > of threads, which makes it more attractive to use over CloudSolrClient which > will use a HTTPSolrClient once it gets a set of nodes to do the updates. > > What is the recom

ConcurrentUpdateSolrClient vs CloudSolrClient for bulk update to SolrCloud

2016-01-12 Thread Shivaji Dutta
We have a customer that needs to update few billion documents to SolrCloud. I know the suggested way of using is SolrCloudClient, for its load balancing feature. As per docs - CloudSolrClient SolrJ client class to communicate with SolrCloud. Instances of this class communicate with Zookeeper t