On 10/7/2015 12:00 PM, Eric Torti wrote:
> Can we read "high reopen rate" as "frequent soft commits"? (In our
> case, hard commits do not open a searcher. But soft commits do).
>
> Considering it does mean "frequent soft commits", I'd say that it
> doesn't fit our setup because we have an index rate of about 10
> updates/s and we perform a soft commit at each 15min. So our scenario
> is not near real time in that sense. In light of this, do you thing
> using NRTCachingDirectory is still convenient?

The NRT factory achieves high speed in NRT situations by flushing very
small updates to RAM instead of the disk.  As more updates come in,
older index segments sitting in RAM will eventually be flushed to disk,
so a sustained flood of updates doesn't really achieve a speed increase,
but a short burst of updates will be searchable *very* quickly.

NRTCachingDirectoryFactory was chosen for Solr examples (and I think
it's the Solr default) because it has no real performance downsides, but
has a strong possibility to be noticeably faster than the standard
factory in NRT situations.

The only problem with it is that small index segments from recent
updates might only exist in RAM, and not get flushed to disk, so they
would be lost if Solr dies or is killed suddenly.  This is part of why
the updateLog feature exists -- when Solr is started, the transaction
logs will be replayed, inserting/replacing (at a minimum) all documents
indexed since the last hard commit.  When the replay is finished, you
will not lose data.  This does require a defined uniqueKey to operate
correctly.

Thanks,
Shawn

Reply via email to