On 2/5/2016 8:11 AM, Joseph Obernberger wrote: > Thank you for the reply Scott - we have the commit settings as: > <autoCommit> > <maxTime>60000</maxTime> > <openSearcher>false</openSearcher> > </autoCommit> > <autoSoftCommit> > <maxTime>15000</maxTime> > </autoSoftCommit> > > Is that 50% disk space rule across the entire HDFS cluster or on an > individual spindle?
That autoSoftCommit maxTime is pretty small. Frequent commits can be a source of problems, if the actual commits take anywhere near (or longer than) the maxTime value to complete. If your commits are taking significantly less than 15 seconds to complete, then it probably isn't anything to worry about. The rule with disk space and Solr/Lucene is that you must have enough free disk space for your largest index to triple in size temporarily, and it's actually recommended to have three times the disk space of *all* your indexes, not just the largest. Most of the time the largest merge you'll see will double the disk space, but in some unusual edge cases, it can triple. I have no idea how disk space works with HDFS when individual data nodes become full. Someone else will have to tackle that question, and it might need to be answered by the Hadoop project rather than here. With autoCommit at 60 seconds, your transaction logs should remain small and there shouldn't be very many of them, so I really have no idea what might be happening with those. Do you have this same autoCommit/autoSoftCommit config on every Solr collection? Erick's note about AlreadyBeingCreatedException may be relevant. Are you possibly sharing a data directory between two or more Solr cores? This can't normally be done, and even if you configure the locking mechanism to allow it, it's NOT recommended, especially with SolrCloud. In SolrCloud, all replicas will write to the index. If two replicas try to write to the same index, then that index will become corrupted and unusable. Thanks, Shawn