SOLR zookeeper connection timeout during startup is hardcoded to 10000ms

2018-08-23 Thread Danny Shih
Hi,

During startup in cloud mode, the SOLR zookeeper connection timeout appears to 
be hardcoded to 1000ms:
https://github.com/apache/lucene-solr/blob/5eab1c3c688a0d8db650c657567f197fb3dcf181/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ZkClientClusterStateProvider.java#L45

And it is not configurable via zkClientTimeout (solr.xml) or SOLR_WAIT_FOR_ZK 
(solr.in.sh).

Is there a way to configure this, and if not, should I open a bug?

Thanks,
Danny


Solr8 improvements to SolrCloud leader election

2020-06-02 Thread Danny Shih
Are there any significant (or not so significant) changes?  I have browsed the 
release notes and searched JIRA, but the latest news seems to be in 7.3 (where 
the old Leader-In-Recovery logic was replaced).

Context:
We are currently running Solr 7.4 in production.  In the past year, we’ve seen 
two cases where, during a rolling restart, one of the collections inexplicably 
ends up without a leader.  We have ways to mitigate – shutdown all instances 
then start up just one to have it be leader for everything, and we just learned 
about the FORCELEADER API.  However, when this condition has struck, reads from 
that collection do not work, so there’s customer impact.

We are planning an upgrade to Solr8.  The timing of that could be influenced by 
whether this upgrade could improve things in this particular area.

Thanks for any info!