On 5/20/2015 5:57 PM, Ryan Cutter wrote: > GC is operating the way I think it should but I am lacking memory. I am > just surprised because indexing is performing fine (documents going in) but > deletions are really bad (documents coming out). > > Is it possible these deletes are hitting many segments, each of which I > assume must be re-built? And if there isn't much slack memory laying > around to begin with, there's a bunch of contention/swap?
A deleteByQuery must first query the entire index to determine which IDs to delete. That's going to hit every segment. In the case of SolrCloud, it will also hit at least one replica of every single shard in the collection. If the data required to satisfy the query is not already sitting in the OS disk cache, then the actual disk must be read. When RAM is extremely tight, any disk operation will erase relevant data out of the OS disk cache, so the next time it is needed, it must be read off the disk again. Disks are SLOW. What I am describing is not swap, but the performance impact is similar to swapping. The actual delete operation (once the IDs are known) doesn't touch any segments ... it writes Lucene document identifiers to a .del file, and that file is consulted on all queries. Any deleted documents found in the query results are removed. Thanks, Shawn