RE: Re: Delayed/waiting requests

Markus Jelsma Wed, 16 Jan 2019 11:11:49 -0800

Hello,

There is an extremely undocumented parameter to get the cache's contents 
displayed. Set showItems="100" on the filter cache.


Regards,
Markus

 
 
-----Original message-----
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday 16th January 2019 17:40
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Re: Delayed/waiting requests
> 
> I don't know of any tools to inspect the cache. Under the covers,
> these are things like Java's ConcurrentHashMap which don't, for
> instance, carry along information like last access time IIUC.
> 
> I usually have to cull the Solr logs and eyeball the fq clauses to see
> if anything jumps out. If you do find any such patterns, you can
> always add {!cache=false} to those clauses to not use up cache
> entries....
> 
> Best,
> Erick
> 
> On Wed, Jan 16, 2019 at 7:53 AM Gael Jourdan-Weil
> <gael.jourdan-w...@kelkoogroup.com> wrote:
> >
> > Ok, I get your point.
> >
> >
> > Do you know if there is a tool to easily view filterCache content?
> >
> > I know we can see the top entries in the API or the UI but could we see 
> > more?
> >
> >
> > Regards,
> >
> > Gaël
> >
> > ________________________________
> > De : Erick Erickson <erickerick...@gmail.com>
> > Envoyé : mardi 15 janvier 2019 19:46:19
> > À : solr-user
> > Objet : Re: Re: Delayed/waiting requests
> >
> > bq. If I get your point, having a big cache might cause more troubles
> > than help if the cache hit ratio is not high enough because the cache
> > is constantly evicting/inserting entries?
> >
> > Pretty much. Although there are nuances.
> >
> > Right now, you have a 12K autowarm count. That means your cache will
> > eventually always contain 12K entries whether or not you ever use the
> > last 11K! I'm simplifying a bit, but it grows like this.
> >
> > Let's say I start Solr. Initially it has no cache entries. Now I start
> > both querying and indexing. For simplicity, say I have 100 _new_  fq
> > clauses come in between each commit. The first commit will autowarm
> > 100. The next will autowarm 200, then 300...... etc. Eventually this
> > will grow to 12K. So your performance will start to vary depending on
> > how long Solr has been running.
> >
> > Worse. it's not clear that you _ever_ re-use those clauses. One example:
> > fq=date_field:[* TO NOW]
> > NOW is really a Unix timestamp. So issuing the same fq 1 millisecond
> > from the first one will not re-use the entry. In the worst case almost
> > all of your autwarming is useless. It neither loads relevant index
> > data into RAM nor is reusable.
> >
> > Even if you use "date math" to round to, say, a minute, if you run
> > Solr long enough you'll still fill up with useless fq clauses.
> >
> > Best,
> > Erick
> >
> > On Tue, Jan 15, 2019 at 9:33 AM Gael Jourdan-Weil
> > <gael.jourdan-w...@kelkoogroup.com> wrote:
> > >
> > > @Erick:
> > >
> > >
> > > We will try to lower the autowarm and run some tests to compare.
> > >
> > > If I get your point, having a big cache might cause more troubles than 
> > > help if the cache hit ratio is not high enough because the cache is 
> > > constantly evicting/inserting entries?
> > >
> > >
> > >
> > > @Jeremy:
> > >
> > >
> > > Index size: ~20G and ~14M documents
> > >
> > > Server memory available: 256G from which ~30G used and ~100G system cache
> > >
> > > Server CPU count: 32, ~10% usage
> > >
> > > JVM memory settings: -Xms12G -Xmx12G
> > >
> > >
> > > We have 3 servers and 3 clusters of 3 Solr instances.
> > >
> > > That is each server hosts 1 Solr instance for each cluster.
> > >
> > > And, indeed, each cluster only has 1 shard with replication factor 3.
> > >
> > >
> > > Among all these Solr instances, the pauses are observed on only one 
> > > single cluster but on every server at different times (sometimes on all 
> > > servers at the same time but I would say it's very rare).
> > >
> > > We do observe the traffic is evenly balanced across the 3 servers, around 
> > > 30-40 queries per second sent to each server.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Gaël
> > >
> > >
> > > ________________________________
> > > De : Branham, Jeremy (Experis) <jb...@allstate.com>
> > > Envoyé : mardi 15 janvier 2019 17:59:56
> > > À : solr-user@lucene.apache.org
> > > Objet : Re: Re: Delayed/waiting requests
> > >
> > > Hi Gael –
> > >
> > > Could you share this information?
> > > Size of the index
> > > Server memory available
> > > Server CPU count
> > > JVM memory settings
> > >
> > > You mentioned a cloud configuration of 3 replicas.
> > > Does that mean you have 1 shard with a replication factor of 3?
> > > Do the pauses occur on all 3 servers?
> > > Is the traffic evenly balanced across those servers?
> > >
> > >
> > > Jeremy Branham
> > > jb...@allstate.com
> > >
> > >
> > > On 1/15/19, 9:50 AM, "Erick Erickson" <erickerick...@gmail.com> wrote:
> > >
> > >     Well, it was a nice theory anyway.
> > >
> > >     "Other collections with the same settings"
> > >     doesn't really mean much unless those other collections are very 
> > > similar,
> > >     especially in terms of numbers of docs.
> > >
> > >     You should only see a new searcher opening when you do a
> > >     hard-commit-with-opensearcher-true or soft commit.
> > >
> > >     So what happens when you just try lowering the autowarm
> > >     count? I'm assuming you're free to test in some non-prod
> > >     system.
> > >
> > >     Focusing on the hit ratio is something of a red herring. Remember
> > >     that each entry in your filterCache is roughly maxDoc/8 + a little
> > >     overhead, the increase in GC pressure has to be balanced
> > >     against getting the hits from the cache.
> > >
> > >     Now, all that said if there's no correlation, then you need to put
> > >     a profiler on the system when you see this kind of thing and
> > >     find out where the hotspots are, otherwise it's guesswork and
> > >     I'm out of ideas.
> > >
> > >     Best,
> > >     Erick
> > >
> > >     On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil
> > >     <gael.jourdan-w...@kelkoogroup.com> wrote:
> > >     >
> > >     > Hi Erick,
> > >     >
> > >     >
> > >     > Thank you for your detailed answer, I better understand autowarming.
> > >     >
> > >     >
> > >     > We have an autowarming time of ~10s for filterCache 
> > > (queryResultCache is not used at all, ratio = 0.02).
> > >     >
> > >     > We increased the size of the filterCache from 6k to 12k (and 
> > > autowarming size set to same values) to have a better ratio which is 
> > > _only_ around 0.85/0.90.
> > >     >
> > >     >
> > >     > The thing I don't understand is I should see "Opening new searcher" 
> > > in the logs everytime a new searcher is opened and thus an autowarming 
> > > happens, right?
> > >     >
> > >     > But I don't see "Opening new searcher" very often, and I don't see 
> > > it being correlated with the response time peaks.
> > >     >
> > >     >
> > >     > Also, I didn't mention it earlier but, we have other SolrCloud 
> > > clusters with similar settings and load (~10s filterCache autowarming, 
> > > 10k entries) and we don't observe the same behavior.
> > >     >
> > >     >
> > >     > Regards,
> > >     >
> > >     > ________________________________
> > >     > De : Erick Erickson <erickerick...@gmail.com>
> > >     > Envoyé : lundi 14 janvier 2019 17:44:38
> > >     > À : solr-user
> > >     > Objet : Re: Delayed/waiting requests
> > >     >
> > >     > Gael:
> > >     >
> > >     > bq. Nevertheless, our filterCache is set to autowarm 12k entries 
> > > which
> > >     > is also the maxSize
> > >     >
> > >     > That is far, far, far too many. Let's assume you actually have 12K
> > >     > entries in the filterCache.
> > >     > Every time you open a new searcher, 12K queries are executed 
> > > _before_
> > >     > the searcher
> > >     > accepts any new requests. While being able to re-use a filterCache
> > >     > entry is useful, one of
> > >     > the primary purposes is to pre-load index data from disk into memory
> > >     > which can be
> > >     > the event that takes the most time.
> > >     >
> > >     > The queryResultCache has a similar function. I often find that this
> > >     > cache doesn't have a
> > >     > very high hit ratio, but again executing a _few_ of these queries
> > >     > warms the index from
> > >     > disk.
> > >     >
> > >     > I think of both caches as a map, where the key is the "thing", (fq
> > >     > clause in the case
> > >     > of filterCache, the whole query in the case of the 
> > > queryResultCache).
> > >     > Autowarming
> > >     > replays the most recently executed N of these entries, essentially
> > >     > just as though
> > >     > they were submitted by a user.
> > >     >
> > >     > Hypothesis: You're massively over-warming, and when that kicks in 
> > > you're seeing
> > >     > increased CPU and GC pressure leading to the anomalies you're 
> > > seeing. Further,
> > >     > you have such excessive autowarming going on that it's hard to see 
> > > the
> > >     > associated messages in the log.
> > >     >
> > >     > Here's what I'd recommend: Set your autowarm counts to something on 
> > > the order
> > >     > of 16. If the culprit is just excessive autowarming, I'd expect 
> > > your spikes to
> > >     > be much less severe. It _might_ be that your users see some 
> > > increased (very
> > >     > temporary) variance in response time. You can tell that the 
> > > autowarming
> > >     > configurations are "more art than science", I can't give you any 
> > > other
> > >     > recommendations than "start small and increase until you're happy"
> > >     > unfortunately.
> > >     >
> > >     > I usually do this with some kind of load tester in a dev lab of 
> > > course ;).
> > >     >
> > >     > Finally, if you use the metrics data (see:
> > >     > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F1_metrics-2Dreporting.html&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=9BWTVr32mplsfAWQ3hnWuVx5V1cL_RgLNDDpg8S2mtk&e=)
> > >     > you can see the autowarm times. Don't get too lost in the page to
> > >     > start, just hit the 
> > > "https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_metrics&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=F8ahrx-8r_hWi7F1M0fHZc-_gEqsmr-yPvSZlGYX5rw&e=";
> > >  endpoint
> > >     > and look for "warmupTime", then refine on how to get _only_
> > >     > the warmup stats ;).
> > >     >
> > >     > Best,
> > >     > Erick
> > >     >
> > >     > On Mon, Jan 14, 2019 at 5:08 AM Gael Jourdan-Weil
> > >     > <gael.jourdan-w...@kelkoogroup.com> wrote:
> > >     > >
> > >     > > I had a look to GC logs this morning but I'm not sure how to 
> > > interpret them.
> > >     > >
> > >     > >
> > >     > > Over a period of 54mn, there is:
> > >     > >
> > >     > > - Number of pauses: 2739
> > >     > >
> > >     > > - Accumulated pauses: 93s => that is 2.86% of the time
> > >     > >
> > >     > > - Average pause duration: 0.03s
> > >     > >
> > >     > > - Average pause interval: 1.18s
> > >     > >
> > >     > > - Accumulated full GC: 0
> > >     > >
> > >     > > I'm not sure if this is a lot or not. What do you think ?
> > >     > >
> > >     > >
> > >     > > Looking more closely to GC logs with GC Viewer, I can notice that 
> > > the high response time peaks happens at the same time where GC pauses 
> > > takes 2x more time (around 0.06s) than average.
> > >     > >
> > >     > >
> > >     > > Also we are indeed indexing at the same time but we have 
> > > autowarming set.
> > >     > >
> > >     > > I don't see any Searcher opened at the time we experience 
> > > slowness.
> > >     > >
> > >     > > Nevertheless, our filterCache is set to autowarm 12k entries 
> > > which is also the maxSize.
> > >     > >
> > >     > > Could this have any downside?
> > >     > >
> > >     > >
> > >     > > Thanks,
> > >     > >
> > >     > > Gaël
> > >     > >
> > >     > >
> > >     > > ________________________________
> > >     > > De : Erick Erickson <erickerick...@gmail.com>
> > >     > > Envoyé : vendredi 11 janvier 2019 17:21
> > >     > > À : solr-user
> > >     > > Objet : Re: Delayed/waiting requests
> > >     > >
> > >     > > Jimi's comment is one of the very common culprits.
> > >     > >
> > >     > > Autowarming is another. Are you indexing at the same
> > >     > > time? If so it could well be  you aren't autowarming and
> > >     > > the spikes are caused by using a new IndexSearcher
> > >     > > that has to read much of the index off disk when commits
> > >     > > happen. The "smoking gun" here would be if the spikes
> > >     > > correlate to your commits (soft or hard-with-opensearcher-true).
> > >     > >
> > >     > > Best,
> > >     > > Erick
> > >     > >
> > >     > > On Fri, Jan 11, 2019 at 1:23 AM Gael Jourdan-Weil
> > >     > > <gael.jourdan-w...@kelkoogroup.com> wrote:
> > >     > > >
> > >     > > > Interesting indeed, we did not see anything with VisualVM but 
> > > having a look at the GC logs could gives us more info, especially on the 
> > > pauses.
> > >     > > >
> > >     > > > I will collect data over the week-end and look at it.
> > >     > > >
> > >     > > >
> > >     > > > Thanks
> > >     > > >
> > >     > > > ________________________________
> > >     > > > De : Hullegård, Jimi <jimi.hulleg...@svensktnaringsliv.se>
> > >     > > > Envoyé : vendredi 11 janvier 2019 03:46:02
> > >     > > > À : solr-user@lucene.apache.org
> > >     > > > Objet : Re: Delayed/waiting requests
> > >     > > >
> > >     > > > Could be caused by garbage collection in the jvm.
> > >     > > >
> > >     > > > 
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_solr_SolrPerformanceProblems&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=csud3Q2nMknKdqj5WMFHMA7mK8nDBNlxAmbDafyp2us&e=
> > >     > > >
> > >     > > > Go down to the segment called “GC pause problems”
> > >     > > >
> > >     > > > /Jimi
> > >     > > >
> > >     > > > Sent from my iPhone
> > >     > > >
> > >     > > > On 11 Jan 2019, at 05:05, Gael Jourdan-Weil 
> > > <gael.jourdan-w...@kelkoogroup.com<mailto:gael.jourdan-w...@kelkoogroup.com>>
> > >  wrote:
> > >     > > >
> > >     > > > Hello,
> > >     > > >
> > >     > > > We are experiencing some performance issues on a simple 
> > > SolrCloud cluster of 3 replicas (1 core) but what we found during our 
> > > analysis seems a bit odd, so we thought the community could have relevant 
> > > ideas on this.
> > >     > > >
> > >     > > > Load: between 30 and 40 queries per second, constant over time 
> > > of analysis
> > >     > > >
> > >     > > > Symptoms: high response time over short period of time but 
> > > quite frequently.
> > >     > > > We are talking about requests response time going from 50ms to 
> > > 5000ms or even worse during less than 5 seconds, and then going back to 
> > > normal.
> > >     > > >
> > >     > > > What we found out: just before response time increase, requests 
> > > seems to be delayed.
> > >     > > > That is during 2/3 seconds, requests pile up, no response is 
> > > sent, and then all requests are resolved and responses are all returned 
> > > to the clients at the same time.
> > >     > > > Very much like if there was a lock happening somewhere. But we 
> > > found no "lock" time nor at JVM or system level.
> > >     > > >
> > >     > > > Does someone can think of something that could explain this in 
> > > the way Solr works ?
> > >     > > > Or ideas to track down the root cause..
> > >     > > >
> > >     > > > Solr version is 7.2.1.
> > >     > > >
> > >     > > > Thanks for reading,
> > >     > > >
> > >     > > > Gaël Jourdan-Weil
> > >     > > >
> > >     > > > Svenskt Näringsliv behandlar dina personuppgifter i enlighet 
> > > med GDPR. Här kan du läsa mer om vår behandling och dina rättigheter, 
> > > Integritetspolicy<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.svensktnaringsliv.se_dataskydd_integritet-2Doch-2Dbehandling-2Dav-2Dpersonuppgifter-5F697219.html-3Futm-5Fsource-3Dsn-2Demail-26utm-5Fmedium-3Demail&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=YaT3hPS_GQ4_rKBEfndYZC583WM6K6boCdbEJKTQirA&e=>
> > >
> > >
>

RE: Re: Delayed/waiting requests

Reply via email to