On 2/5/2016 8:11 AM, Joseph Obernberger wrote:
> Thank you for the reply Scott - we have the commit settings as:
> <autoCommit>
>       <maxTime>60000</maxTime>
>       <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
>         <maxTime>15000</maxTime>
> </autoSoftCommit>
>
> Is that 50% disk space rule across the entire HDFS cluster or on an
> individual spindle?

That autoSoftCommit maxTime is pretty small.  Frequent commits can be a
source of problems, if the actual commits take anywhere near (or longer
than) the maxTime value to complete.  If your commits are taking
significantly less than 15 seconds to complete, then it probably isn't
anything to worry about.

The rule with disk space and Solr/Lucene is that you must have enough
free disk space for your largest index to triple in size temporarily,
and it's actually recommended to have three times the disk space of
*all* your indexes, not just the largest.  Most of the time the largest
merge you'll see will double the disk space, but in some unusual edge
cases, it can triple.

I have no idea how disk space works with HDFS when individual data nodes
become full.  Someone else will have to tackle that question, and it
might need to be answered by the Hadoop project rather than here.

With autoCommit at 60 seconds, your transaction logs should remain small
and there shouldn't be very many of them, so I really have no idea what
might be happening with those.  Do you have this same
autoCommit/autoSoftCommit config on every Solr collection?

Erick's note about AlreadyBeingCreatedException may be relevant.  Are
you possibly sharing a data  directory between two or more Solr cores? 
This can't normally be done, and even if you configure the locking
mechanism to allow it, it's NOT recommended, especially with SolrCloud. 
In SolrCloud, all replicas will write to the index.  If two replicas try
to write to the same index, then that index will become corrupted and
unusable.

Thanks,
Shawn

Reply via email to