Ah yes, I was about to mention that, -DnumShards is only actually used when the collection is being created for the first time. After that point (i.e. once the collection exists in ZK), passing it along the command line is redundant (Solr won't actually read it). I know preferred mechanism of creating collections is to use the collectionAPI, in which case you never use -DnumShards at all. Having it on the command line can be confusing (we've fallen into that trap too!)
The only way to change the number of shards on a collection is to use the collection API to split a shard (and currently you can only do that in steps of 2, so you'll need to do 1->2, 2->4, 4->8, 8->16. You can't get from 1 -> 24 as its not a power of 2 :( What you want is https://issues.apache.org/jira/browse/SOLR-5004 Otherwise, you'll need to create a new collection and re-index everything into that. On 24 October 2013 16:35, Hoggarth, Gil <gil.hogga...@bl.uk> wrote: > I think my question is easier, because I think the problem below was > caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg' > zk collection name didn't specify the number of shards (and thus > defaulted to 1). > > So, how can I change the number of shards for an existing collection/zk > collection name, especially when the ZK ensemble in question is the > production version and supporting other Solr collections that I do not > want to interrupt. (Which I think means that I can't just delete the > clusterstate.json and restart the ZKs as this will also lose the other > Solr collection information.) > > Thanks in advance, Gil > > -----Original Message----- > From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] > Sent: 24 October 2013 10:13 > To: solr-user@lucene.apache.org > Subject: RE: New shard leaders or existing shard replicas depends on > zookeeper? > > Absolutely, the scenario I'm seeing does _sound_ like I've not specified > the number of shards, but I think I have - the evidence is: > - DnumShards=24 defined within the /etc/sysconfig/solrnode* files > > - DnumShards=24 seen on each 'ps' line (two nodes listed here): > " tomcat 26135 1 5 09:51 ? 00:00:22 /opt/java/bin/java > -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log > ging.properties > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en > -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf > -Dcollection.configName=ldwa01cfg -DnumShards=24 > -Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data > -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl > .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath > /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar > -Dcatalina.base=/opt/tomcat_instances/solrnode1 > -Dcatalina.home=/opt/tomcat > -Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp > org.apache.catalina.startup.Bootstrap start > tomcat 26225 1 5 09:51 ? 00:00:19 /opt/java/bin/java > -Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log > ging.properties > -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager > -Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en > -Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf > -Dcollection.configName=ldwa01cfg -DnumShards=24 > -Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data > -DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl > .uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath > /opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar > -Dcatalina.base=/opt/tomcat_instances/solrnode2 > -Dcatalina.home=/opt/tomcat > -Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp > org.apache.catalina.startup.Bootstrap start" > > - The Solr node dashboard shows "-DnumShards=24" in its list of Args for > each node > > And yet, the ldwa01 nodes are leader and replica of shard 17 and there > are no other shard leaders created. Plus, if I only change the ZK > ensemble declarations in /etc/system/solrnode* to the different dev ZK > servers, all 24 leaders are created before any replicas are added. > > I can also mention, when I browse the Cloud view, I can see both the > ldwa01 collection and the ukdomain collection listed, suggesting that > this information comes from the ZKs - I assume this is as expected. > Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed > for ldwa01 but these addresses are also listed as 'Down' in the ukdomain > collection (except for :8983 which only shows in the ldwa01 collection). > > Any help very gratefully received. > Gil > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: 23 October 2013 18:50 > To: solr-user@lucene.apache.org > Subject: Re: New shard leaders or existing shard replicas depends on > zookeeper? > > My first impulse would be to ask how you created the collection. It sure > _sounds_ like you didn't specify 24 shards and thus have only a single > shard, one leader and 23 replicas.... > > bq: ...to point to the zookeeper ensemble also used for the ukdomain > collection... > > so my guess is that this ZK ensemble has the ldwa01 collection defined > as having only one shard.... > > I admit I pretty much skimmed your post though... > > Best, > Erick > > > On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil <gil.hogga...@bl.uk> > wrote: > > > Hi solr-users, > > > > > > > > I'm seeing some confusing behaviour in Solr/zookeeper and hope you can > > > shed some light on what's happening/how I can correct it. > > > > > > > > We have two physical servers running automated builds of RedHat 6.4 > > and Solr 4.4.0 that host two separate Solr services. The first server > > (called ld01) has 24 shards and hosts a collection called 'ukdomain'; > > the second server (ld02) also has 24 shards and hosts a different > > collection called 'ldwa01'. It's evidently important to note that > > previously both of these physical servers provided the 'ukdomain' > > collection, but the 'ldwa01' server has been rebuilt for the new > > collection. > > > > > > > > When I start the ldwa01 solr nodes with their zookeeper configuration > > (defined in /etc/sysconfig/solrnode* and with collection.configName as > > 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes > > > initially become shard leaders and then replicas as I'd expect. But if > > > I change the ldwa01 solr nodes to point to the zookeeper ensemble also > > > used for the ukdomain collection, all ldwa01 solr nodes start on the > > same shard (that is, the first ldwa01 solr node becomes the shard > > leader, then every other solr node becomes a replica for this shard). > > The significant point here is no other ldwa01 shards gain leaders (or > replicas). > > > > > > > > The ukdomain collection uses a zookeeper collection.configName of > > 'ukdomaincfg', and prior to the creation of this ldwa01 service the > > collection.configName of 'ldwa01cfg' has never previously been used. > So > > I'm confused why the ldwa01 service would differ when the only > > difference is which zookeeper ensemble is used (both zookeeper > > ensembles are automatedly built using version 3.4.5). > > > > > > > > If anyone can explain why this is happening and how I can get the > ldwa01 > > services to start correctly using the non-development zookeeper > > ensemble, I'd be very grateful! If more information or explanation is > > needed, just ask. > > > > > > > > Thanks, Gil > > > > > > > > Gil Hoggarth > > > > Web Archiving Technical Services Engineer > > > > The British Library, Boston Spa, West Yorkshire, LS23 7BQ > > > > > > > > >