Do not, repeat NOT expungeDelete after each deleteByQuery, it is a very expensive operation. Perhaps after the nightly batch, but I doubt that’ll help much anyway.
30% deleted docs is quite normal, and should definitely not change the response time by a factor of 100! So there’s some other issue in your environment. So the things I’d check: 1> the schema is exactly the same. It’s vaguely possible that the schema is just a tiny bit different. If that’s the case, you need to delete your entire collection’s data and re-index from scratch. You can index to a new collection and use collection aliasing to do this seamlessly 2> Your solrconfig is exactly the same, especially the filterCache cache settings. I call out filterCache because you specifically mention filter queries, but check your other caches too. 3> Check your filterCache usage statistics. If you see drastically different hit ratios in the two environments, you need to pursue that. 4> Once and always, check your GC performance on the two environments. It’s a low-probability item, but you may be just enough different in prod that GC is an issue. 5> Take a look at the QTimes recorded in your solr logs to insure that the difference isn’t outside of Solr. While I can’t say what the exact problem is, I’m 99% sure that the number of deleted docs isn’t the culprit. Best, Erick > On May 9, 2020, at 6:22 PM, Ganesh Sethuraman <ganeshmail...@gmail.com> wrote: > > Hi Solr Users, > > We use SolrCloud 7.2.1 with 2 Solr nodes in AWS. The shard size for these > collections does not exceed more than 5G. They have approximately 16 shards > with 2 replicas. We do deletes (ByQuery) as well large updates in some of > these Solr collections. We are seeing slower filter queries (95% > 10secs) > on these collections in production, same collections, and same queries in > our lower environment with similar setup and configuration we seeing much > better performance (<100ms). These are NRT indexes, with daily batch > updates only. > > We see a difference however in the lower environment; that we don't see > updates or deletes, we see in Segment Info for each of the Solr code there > are ZERO delete percentages. Could this be the reason for the faster query > response time in our lower environment? in our production environment, we > are seeing about 30-32% of deletes in each core shard/replica pair. > > Does this segment delete % has any correlation with query response time? We > do delete by Query in a loop. Also updates. > If it is so, do you suggest to try to do Optimize or expungeDelete at the > end every day? > Do we need to expunge delete after each delete ByQuery or do it once at the > end? > > Regards, > Ganesh