Hello all,

The default directory implementation in Solr 4 is the NRTCachingDirectory
(in the example solrconfig.xml file , see below).

The Javadoc for NRTCachingDirectoy (
http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true)
 says:

 "This class is likely only useful in a near real-time context, where
indexing rate is lowish but reopen rate is highish, resulting in many tiny
files being written..."

It seems like we have exactly the opposite use case, so we would like
advice on what directory implementation to use instead.

We are doing offline batch indexing, so no searches are being done.  So we
don't need NRT.  We also have a high indexing rate as we are trying to
index 3 billion pages as quickly as possible.

I am not clear what determines the reopen rate.   Is it only related to
searching or is it involved in indexing as well?

 Does the NRTCachingDirectory have any benefit for indexing under the use
case noted above?

I'm guessing we should just use the solrStandardDirectoryFactory instead.
 Is this correct?

Tom

-------------------------------





<!-- The DirectoryFactory to use for indexes.

       solr.StandardDirectoryFactory is filesystem
       based and tries to pick the best implementation for the current
       JVM and platform.  solr.NRTCachingDirectoryFactory, the default,
       wraps solr.StandardDirectoryFactory and caches small files in memory
       for better NRT performance.

       One can force a particular implementation via
solr.MMapDirectoryFactory,
       solr.NIOFSDirectoryFactory, or solr.SimpleFSDirectoryFactory.

       solr.RAMDirectoryFactory is memory based, not
       persistent, and doesn't work with replication.
    -->
  <directoryFactory name="DirectoryFactory"

class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

Reply via email to