Hello all, The default directory implementation in Solr 4 is the NRTCachingDirectory (in the example solrconfig.xml file , see below).
The Javadoc for NRTCachingDirectoy ( http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true) says: "This class is likely only useful in a near real-time context, where indexing rate is lowish but reopen rate is highish, resulting in many tiny files being written..." It seems like we have exactly the opposite use case, so we would like advice on what directory implementation to use instead. We are doing offline batch indexing, so no searches are being done. So we don't need NRT. We also have a high indexing rate as we are trying to index 3 billion pages as quickly as possible. I am not clear what determines the reopen rate. Is it only related to searching or is it involved in indexing as well? Does the NRTCachingDirectory have any benefit for indexing under the use case noted above? I'm guessing we should just use the solrStandardDirectoryFactory instead. Is this correct? Tom ------------------------------- <!-- The DirectoryFactory to use for indexes. solr.StandardDirectoryFactory is filesystem based and tries to pick the best implementation for the current JVM and platform. solr.NRTCachingDirectoryFactory, the default, wraps solr.StandardDirectoryFactory and caches small files in memory for better NRT performance. One can force a particular implementation via solr.MMapDirectoryFactory, solr.NIOFSDirectoryFactory, or solr.SimpleFSDirectoryFactory. solr.RAMDirectoryFactory is memory based, not persistent, and doesn't work with replication. --> <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>