GC is operating the way I think it should but I am lacking memory. I am just surprised because indexing is performing fine (documents going in) but deletions are really bad (documents coming out).
Is it possible these deletes are hitting many segments, each of which I assume must be re-built? And if there isn't much slack memory laying around to begin with, there's a bunch of contention/swap? Thanks Shawn! On Wed, May 20, 2015 at 4:50 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 5/20/2015 5:41 PM, Ryan Cutter wrote: > > I have a collection with 1 billion documents and I want to delete 500 of > > them. The collection has a dozen shards and a couple replicas. Using > Solr > > 4.4. > > > > Sent the delete query via HTTP: > > > > http://hostname:8983/solr/my_collection/update?stream.body= > > <delete><query>source:foo</query></delete> > > > > Took a couple minutes and several replicas got knocked into Recovery > mode. > > They eventually came back and the desired docs were deleted but the > cluster > > wasn't thrilled (high load, etc). > > > > Is this expected behavior? Is there a better way to delete documents > that > > I'm missing? > > That's the correct way to do the delete. Before you'll see the change, > a commit must happen in one way or another. Hopefully you already knew > that. > > I believe that your setup has some performance issues that are making it > very slow and knocking out your Solr nodes temporarily. > > The most common root problems with SolrCloud and indexes going into > recovery are: 1) Your heap is enormous but your garbage collection is > not tuned. 2) You don't have enough RAM, separate from your Java heap, > for adequate index caching. With a billion documents in your > collection, you might even be having problems with both. > > Here's a wiki page that includes some info on both of these problems, > plus a few others: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn > >