Ok, thanks Shawn! That makes sense. We'll be experimenting with it.
Best, Eric On Wed, Oct 7, 2015 at 5:54 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/7/2015 12:00 PM, Eric Torti wrote: >> Can we read "high reopen rate" as "frequent soft commits"? (In our >> case, hard commits do not open a searcher. But soft commits do). >> >> Considering it does mean "frequent soft commits", I'd say that it >> doesn't fit our setup because we have an index rate of about 10 >> updates/s and we perform a soft commit at each 15min. So our scenario >> is not near real time in that sense. In light of this, do you thing >> using NRTCachingDirectory is still convenient? > > The NRT factory achieves high speed in NRT situations by flushing very > small updates to RAM instead of the disk. As more updates come in, > older index segments sitting in RAM will eventually be flushed to disk, > so a sustained flood of updates doesn't really achieve a speed increase, > but a short burst of updates will be searchable *very* quickly. > > NRTCachingDirectoryFactory was chosen for Solr examples (and I think > it's the Solr default) because it has no real performance downsides, but > has a strong possibility to be noticeably faster than the standard > factory in NRT situations. > > The only problem with it is that small index segments from recent > updates might only exist in RAM, and not get flushed to disk, so they > would be lost if Solr dies or is killed suddenly. This is part of why > the updateLog feature exists -- when Solr is started, the transaction > logs will be replayed, inserting/replacing (at a minimum) all documents > indexed since the last hard commit. When the replay is finished, you > will not lose data. This does require a defined uniqueKey to operate > correctly. > > Thanks, > Shawn >