On 12/8/2017 2:40 AM, Sabeer Hussain wrote: > I am using Solr 7.1 version and deployed it in standalone mode. I have > created a scheduler in my application itself to perform delta-import > operation based on a pre-configured frequency. I have used the following > lines of code (in java) to invoke delta-import operation
When the language is Java, I would use SolrJ. The code tends to be easier to write and easier to read than code like you've written which uses HTTP functionality built into Java. The response objects have a lot of sugar methods, and the entire response is available as a Java object that's easy to use in code -- you don't have to worry about parsing the response into Java objects. > Now, I want to deploy the application in SolrCloud mode and for each core, > there will be 2 more replicas. Most things in SolrCloud should be done at the collection level -- replacing "corename" with "collectionname" in the URL you have in your code. But DIH (the dataimport handler) is not one of them. Using DIH at the collection level is possible, but you'll find that the requests are load-balanced across the cloud, so you are likely to get a status from a different replica than you sent the import to. So, even though most of the time I would recommend using CloudSolrClient from SolrJ when running SolrCloud, for the dataimport handler, you should actually use HttpSolrClient. If the index has only one shard, or you are using the compositeId router for automatic distribution of data between multiple shards, then running an import on *ANY* core in the collection will distribute and replicate data as you would expect across the entire collection. If you're using the implicit router and there are multiple shards, then things get a lot more tricky, but SolrCloud will still do all the replication for you. I'm not going to go into detail about shards in this message. Here's some SolrJ code to start an import and print the response. The example code includes a possible core name for the "foo" collection in SolrCloud. A specific core should be used for DIH so that you can be sure that all requests are sent to the same place. The query I've built in the example doesn't have all the parameters you included, but you should be able to see how to add anything you need. One thing I'm not clear on is whether the distrib=false parameter is required to disable the load balancing. /* * By using a base URL without a core/collection name, one client * object can be used for requests to multiple indexes hosted on the * server side. */ String baseUrl = "http://host:port/solr"; String coreName = "foo_shard1_replica_n1"; SolrClient client = new HttpSolrClient.Builder(baseUrl).build(); SolrQuery startQuery = new SolrQuery(); startQuery.setRequestHandler("/dataimport"); startQuery.set("command", "delta-import"); startQuery.set("clean", "false"); try{ QueryResponse response = client.query(coreName, startQuery); System.out.println(response.getResponse().toString()); } catch (SolrServerException e){ // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e){ // TODO Auto-generated catch block e.printStackTrace(); } As I mentioned above, for most types of requests against SolrCloud (other than DIH), you should use CloudSolrClient, not HttpSolrClient, and send requests to the collection instead of a specific core. The cloud client is initialized using ZooKeeper info rather than a URL. It is fully aware of the entire cloud at all times. For DIH though, you don't want to send things to the collection, because of SolrCloud's inherent load balancing. The difficulties of getting a program to deal with a DIH status response are a whole separate discussion. Thanks, Shawn