On 7/14/2013 6:42 AM, kowish.adamosh wrote:
> The problem is that I don't want to invoke data import on 8 server nodes but
> to choose only one for scheduling. Of course if this server will shut down
> then another one needs to take the scheduler role. I can see that there is
> task for sheduling https://issues.apache.org/jira/browse/SOLR-2305 . I hope
> they will take into account SolrCloud. And that's why I wanted to know if
> current node is *currently* elected as the leader. The leader would be the
> scheduler.
> 
> In the meanwhile, any ideas of how to solve data import scheduling on
> SolrCloud architecture?

As Jack already replied, this is outside the scope of Solr.

SOLR-2305 has been around for a VERY long time.  Adding scheduling
capability to the dataimport handler is not very hard, but nobody has
done so because we do not believe this is something Solr should be
handling.  Also, it's easy to get something wrong, so users can run into
bugs that would break their scheduling.

Every operating system has scheduling capability.  Windows has the task
scheduler.  On virtually all other operating systems, you'll find cron.
 These systems have had years of operation for their authors to work out
the bugs, and they are VERY solid.

We would not be able to make the same robustness guarantee if we
included scheduling in Solr.  Additionally, we really want to be sure
that Solr never does anything on its own that has not been specifically
requested by a user or program, or through certain external events such
as a hardware or software failure.

For my own multi-server Linux Solr installation, which doesn't use
SolrCloud even though it's got two complete copies of the index and uses
shards, I have worked out how to do clustered scheduling.  I have a
corosync/pacemaker cluster set up on my servers, which ensures that only
one copy of my cronjobs is running on the cluster.  If a server dies, it
will start up the cronjobs on another server.

Thanks,
Shawn

Reply via email to