How are you stopping Solr? Nodes should not go into recovery on startup unless Solr was killed un-gracefully (i.e. kill -9 or the like). If you use the bin/solr script to stop Solr and see a message about "killing XXX forcefully" then you can lengthen out the time the script waits for shutdown (there's a sysvar you can set, look in the script).
Actually I'll correct myself a bit. Shards _do_ go into recovery but it should be very short in the graceful shutdown case. Basically shards temporarily go into recovery as part of normal processing just long enough to see there's no recovery necessary, but that should be measured in a few seconds. What it sounds like from this "The shards go into recovery and start to utilize nearly all of their network" is that your nodes go into "full recovery" where the entire index is copied down because the replica thinks it's "too far" out of date. That indicates something weird about the state when the Solr nodes stopped. wild-shot-in-the-dark question. How big are your tlogs? If you don't hard commit very often, the tlogs can replay at startup for a very long time. This makes no sense to me, I'm surely missing something: The process at this point is to start one node, find out the lock files, wait for it to come up completely (hours), stop it, delete the write.lock files, and restart. Usually this second restart is faster, but it still can take 20-60 minutes. When you start one node it may take a few minutes for leader electing to kick in (the default is 180 seconds) but absent replication it should just be there. Taking hours totally violates my expectations. What does Solr _think_ it's doing? What's in the logs at that point? And if you stop solr gracefully, there shouldn't be a problem with write.lock. You could also try increasing the timeouts, and the HDFS directory factory has some parameters to tweak that are a mystery to me... All in all, this is behavior that I find mystifying. Best, Erick On Tue, Nov 21, 2017 at 5:07 AM, Joe Obernberger <joseph.obernber...@gmail.com> wrote: > Hi All - we have a system with 45 physical boxes running solr 6.6.1 using > HDFS as the index. The current index size is about 31TBytes. With 3x > replication that takes up 93TBytes of disk. Our main collection is split > across 100 shards with 3 replicas each. The issue that we're running into > is when restarting the solr6 cluster. The shards go into recovery and start > to utilize nearly all of their network interfaces. If we start too many of > the nodes at once, the shards will go into a recovery, fail, and retry loop > and never come up. The errors are related to HDFS not responding fast > enough and warnings from the DFSClient. If we stop a node when this is > happening, the script will force a stop (180 second timeout) and upon > restart, we have lock files (write.lock) inside of HDFS. > > The process at this point is to start one node, find out the lock files, > wait for it to come up completely (hours), stop it, delete the write.lock > files, and restart. Usually this second restart is faster, but it still can > take 20-60 minutes. > > The smaller indexes recover much faster (less than 5 minutes). Should we > have not used so many replicas with HDFS? Is there a better way we should > have built the solr6 cluster? > > Thank you for any insight! > > -Joe >