Shawn, thank you very much for that explanation.  It helps a lot.

Cheers, Ryan

On Wed, May 20, 2015 at 5:07 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/20/2015 5:57 PM, Ryan Cutter wrote:
> > GC is operating the way I think it should but I am lacking memory.  I am
> > just surprised because indexing is performing fine (documents going in)
> but
> > deletions are really bad (documents coming out).
> >
> > Is it possible these deletes are hitting many segments, each of which I
> > assume must be re-built?  And if there isn't much slack memory laying
> > around to begin with, there's a bunch of contention/swap?
>
> A deleteByQuery must first query the entire index to determine which IDs
> to delete.  That's going to hit every segment.  In the case of
> SolrCloud, it will also hit at least one replica of every single shard
> in the collection.
>
> If the data required to satisfy the query is not already sitting in the
> OS disk cache, then the actual disk must be read.  When RAM is extremely
> tight, any disk operation will erase relevant data out of the OS disk
> cache, so the next time it is needed, it must be read off the disk
> again.  Disks are SLOW.  What I am describing is not swap, but the
> performance impact is similar to swapping.
>
> The actual delete operation (once the IDs are known) doesn't touch any
> segments ... it writes Lucene document identifiers to a .del file, and
> that file is consulted on all queries.  Any deleted documents found in
> the query results are removed.
>
> Thanks,
> Shawn
>
>

Reply via email to