Re: SolrCloud delete by query performance

Shawn Heisey Wed, 20 May 2015 17:09:07 -0700

On 5/20/2015 5:57 PM, Ryan Cutter wrote:
> GC is operating the way I think it should but I am lacking memory.  I am
> just surprised because indexing is performing fine (documents going in) but
> deletions are really bad (documents coming out).
> 
> Is it possible these deletes are hitting many segments, each of which I
> assume must be re-built?  And if there isn't much slack memory laying
> around to begin with, there's a bunch of contention/swap?


A deleteByQuery must first query the entire index to determine which IDs
to delete.  That's going to hit every segment.  In the case of
SolrCloud, it will also hit at least one replica of every single shard
in the collection.

If the data required to satisfy the query is not already sitting in the
OS disk cache, then the actual disk must be read.  When RAM is extremely
tight, any disk operation will erase relevant data out of the OS disk
cache, so the next time it is needed, it must be read off the disk
again.  Disks are SLOW.  What I am describing is not swap, but the
performance impact is similar to swapping.

The actual delete operation (once the IDs are known) doesn't touch any
segments ... it writes Lucene document identifiers to a .del file, and
that file is consulted on all queries.  Any deleted documents found in
the query results are removed.

Thanks,
Shawn

Re: SolrCloud delete by query performance

Reply via email to