On 7/14/2013 6:42 AM, kowish.adamosh wrote: > The problem is that I don't want to invoke data import on 8 server nodes but > to choose only one for scheduling. Of course if this server will shut down > then another one needs to take the scheduler role. I can see that there is > task for sheduling https://issues.apache.org/jira/browse/SOLR-2305 . I hope > they will take into account SolrCloud. And that's why I wanted to know if > current node is *currently* elected as the leader. The leader would be the > scheduler. > > In the meanwhile, any ideas of how to solve data import scheduling on > SolrCloud architecture?
As Jack already replied, this is outside the scope of Solr. SOLR-2305 has been around for a VERY long time. Adding scheduling capability to the dataimport handler is not very hard, but nobody has done so because we do not believe this is something Solr should be handling. Also, it's easy to get something wrong, so users can run into bugs that would break their scheduling. Every operating system has scheduling capability. Windows has the task scheduler. On virtually all other operating systems, you'll find cron. These systems have had years of operation for their authors to work out the bugs, and they are VERY solid. We would not be able to make the same robustness guarantee if we included scheduling in Solr. Additionally, we really want to be sure that Solr never does anything on its own that has not been specifically requested by a user or program, or through certain external events such as a hardware or software failure. For my own multi-server Linux Solr installation, which doesn't use SolrCloud even though it's got two complete copies of the index and uses shards, I have worked out how to do clustered scheduling. I have a corosync/pacemaker cluster set up on my servers, which ensures that only one copy of my cronjobs is running on the cluster. If a server dies, it will start up the cronjobs on another server. Thanks, Shawn