Shawn, thank you very much for that explanation. It helps a lot. Cheers, Ryan
On Wed, May 20, 2015 at 5:07 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 5/20/2015 5:57 PM, Ryan Cutter wrote: > > GC is operating the way I think it should but I am lacking memory. I am > > just surprised because indexing is performing fine (documents going in) > but > > deletions are really bad (documents coming out). > > > > Is it possible these deletes are hitting many segments, each of which I > > assume must be re-built? And if there isn't much slack memory laying > > around to begin with, there's a bunch of contention/swap? > > A deleteByQuery must first query the entire index to determine which IDs > to delete. That's going to hit every segment. In the case of > SolrCloud, it will also hit at least one replica of every single shard > in the collection. > > If the data required to satisfy the query is not already sitting in the > OS disk cache, then the actual disk must be read. When RAM is extremely > tight, any disk operation will erase relevant data out of the OS disk > cache, so the next time it is needed, it must be read off the disk > again. Disks are SLOW. What I am describing is not swap, but the > performance impact is similar to swapping. > > The actual delete operation (once the IDs are known) doesn't touch any > segments ... it writes Lucene document identifiers to a .del file, and > that file is consulted on all queries. Any deleted documents found in > the query results are removed. > > Thanks, > Shawn > >