Hi solr-users,
I'm seeing some confusing behaviour in Solr/zookeeper and hope you can shed some light on what's happening/how I can correct it. We have two physical servers running automated builds of RedHat 6.4 and Solr 4.4.0 that host two separate Solr services. The first server (called ld01) has 24 shards and hosts a collection called 'ukdomain'; the second server (ld02) also has 24 shards and hosts a different collection called 'ldwa'. It's evidently important to note that previously both of these physical servers provided the 'ukdomain' collection, but the 'ldwa' server has been rebuilt for the new collection. When I start the ldwa solr nodes with their zookeeper configuration (defined in /etc/sysconfig/solrnode* and with collection.configName as 'ldwacfg') pointing to the development zookeeper ensemble, all nodes initially become shard leaders and then replicas as I'd expect. But if I change the ldwa solr nodes to point to the zookeeper ensemble also used for the ukdomain collection, all ldwa solr nodes start on the same shard (that is, the first ldwa solr node becomes the shard leader, then every other solr node becomes a replica for this shard). The significant point here is no other ldwa shards gain leaders (or replicas). The ukdomain collection uses a zookeeper collection.configName of 'ukdomaincfg', and prior to the creation of this ldwa service the collection.configName of 'ldwacfg' has never previously been used. So I'm confused why the ldwa service would differ when the only difference is which zookeeper ensemble is used (both zookeeper ensembles are automatedly built using version 3.4.5). If anyone can explain why this is happening and how I can get the ldwa services to start correctly using the non-development zookeeper ensemble, I'd be very grateful! If more information or explanation is needed, just ask. Thanks, Gil Gil Hoggarth Web Archiving Technical Services Engineer The British Library, Boston Spa, West Yorkshire, LS23 7BQ