Looks like the problem is related to tlog rotation on the follower shard. We did the following for a specific shard.
0. start solr cloud 1. solr-0 (leader), solr-1, solr-2 2. rebalance to make solr-1 as preferred leader 3. solr-0, solr-1 (leader), solr-2 The tlog file on solr-0 kept on growing infinitely (100s of GBs) until we shut the cluster and dropped all shards (manually). Only way to "restart" tlog rotation on solr-0 (follower) was to issue /admin/cores/action=RELOAD&core=xxxxx atleast twice when the tlog size was small (in MBs). Also if rebalance is is issued to select solr-0 as a leader, leader election never completes. solr-0 output after step (3) above. solr-0 2140856 ./data2/mydata_0_e0000000-ffffffff/tlog 2140712 ./data2/mydata_0_e0000000-ffffffff/tlog/tlog.0000000000000000021 solr-1 (leader) 35268 ./data2/mydata_0_e0000000-ffffffff/tlog 35264 ./data2/mydata_0_e0000000-ffffffff/tlog/tlog.0000000000000000055 solr-2 35256 ./data2/mydata_0_e0000000-ffffffff/tlog 35252 ./data2/mydata_0_e0000000-ffffffff/tlog/tlog.0000000000000000054 -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html