Here's what I copied from an explanation from Uwe Schindler, 'cause I believe most anything he has to say on this subject:
It is just simple, Lucene locking and commits do not work correct on NFS file systems because they are not fully POSIX conformant. Because of this you may also produce corrupt indexes, as commits don't work and corrupt concurrently open files. Also you may see JVM crushes if memory mapped files are unmapped because of network failures and cause SIGSEGV. If you want to use Lucene on NFS mounts, you have 2 possibilities: - Change to CIFS/Samba mounts (CIFS conforms to POSIX standards like delete on last close and also supports correct locking with NativeFSLockFactory) -- or move to local disks! - Use a special deletion policy (https://lucene.apache.org/) to make the commits not corrupt you open IndexSearchers because of suddenly disappearing files (Lucene deletes files while they are open, as POSIX has delete-on-last-close) and use SimpleFSLockFactory. But SimpleFSLockFactory may hit stale lock files issues on killed JVMs. Also don't use MMapDirectory for file storage as this wil likely crush your JVM on network problems! Some background: The original and recommended lock system works correct with killed VMs, as the existence of the lock file has nothing to do with the "state" of being locked. The lock file is just a placeholder to actually have a file instance to do the locking. There is no solution for mixed NFS and non-NFS directories. So get either rid of them or add your own logic to choose right lock and deletion policy depending on the file system. You may use Java 7+'s Path/Files API to get all mount points. Memory mapping is risky with NFS, as a no-longer reachable file may suddenly unmap the cache buffers from process space and the next access will segmentation fault your JVM. The Snapshot deletion policy keeps the last commits available on disk, so the "delete-on-last-close" behaviour by POSIX is not required. But you have to take care to delete snapshots when you have closed all readers. On Wed, Sep 5, 2018 at 6:59 AM Shawn Heisey <apa...@elyograg.org> wrote: > > On 9/5/2018 6:55 AM, Imran Rajjad wrote: > > I am using Solr Cloud 6.4.1. After a hard restart the solr nodes are > > constantly showing to be in DOWN state and would not go into recovery. I > > have also deleted the write.lock files from all the replica folders, but > > the problem would not go away. The error displayed at web console is : no > > locks available > > > > My replica folders reside in an nfs mount, I am using RHEL 6/CentOS6.8. Has > > anyone ever faced this issue? > > Lucene-based software (including Solr) does NOT work well on NFS. NFS > does not provide all the locking functionality that Lucene tries to use > by default. > > You're probably going to need to change the lock factory, and might even > need to completely disable locking. If you do disable locking, you have > to be VERY careful to never allow more than one core or more than one > Solr instance to try and open a core directory. Doing so will likely > corrupt the index. > > I strongly recommend NOT using NFS storage for Solr. In addition to > locking problems, it also tends to be extremely slow.Use a local filesystem. > > Thanks, > Shawn >