On Jul 10, 2013, at 9:16am, Shawn Heisey <s...@elyograg.org> wrote: > On 7/10/2013 9:59 AM, Tom Burton-West wrote: >> The Javadoc for NRTCachingDirectoy ( >> http://lucene.apache.org/core/4_3_1/core/org/apache/lucene/store/NRTCachingDirectory.html?is-external=true) >> says: >> >> "This class is likely only useful in a near real-time context, where >> indexing rate is lowish but reopen rate is highish, resulting in many tiny >> files being written..." >> >> It seems like we have exactly the opposite use case, so we would like >> advice on what directory implementation to use instead. >> >> We are doing offline batch indexing, so no searches are being done. So we >> don't need NRT. We also have a high indexing rate as we are trying to >> index 3 billion pages as quickly as possible. >> >> I am not clear what determines the reopen rate. Is it only related to >> searching or is it involved in indexing as well? >> >> Does the NRTCachingDirectory have any benefit for indexing under the use >> case noted above? >> >> I'm guessing we should just use the solrStandardDirectoryFactory instead. >> Is this correct? > > The NRT directory object in Solr uses the MMap implementation as its default > delegate.
The code I see seems to be using an FSDirectory, or is there another layer of wrapping going on here? return new NRTCachingDirectory(FSDirectory.open(new File(path)), maxMergeSizeMB, maxCachedMB); > I would use MMapDirectoryFactory (the default for most of the 3.x releases) > for testing whether you can get any improvement from moving away from the > default. The advantages of memory mapping are not something you'd want to > give up. Tom - did you ever get any useful results from testing here? I'm also curious about the impact of various xxxDirectoryFactory implementations for batch indexing. Thanks, -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr