Hello, There is an extremely undocumented parameter to get the cache's contents displayed. Set showItems="100" on the filter cache.
Regards, Markus -----Original message----- > From:Erick Erickson <erickerick...@gmail.com> > Sent: Wednesday 16th January 2019 17:40 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Re: Delayed/waiting requests > > I don't know of any tools to inspect the cache. Under the covers, > these are things like Java's ConcurrentHashMap which don't, for > instance, carry along information like last access time IIUC. > > I usually have to cull the Solr logs and eyeball the fq clauses to see > if anything jumps out. If you do find any such patterns, you can > always add {!cache=false} to those clauses to not use up cache > entries.... > > Best, > Erick > > On Wed, Jan 16, 2019 at 7:53 AM Gael Jourdan-Weil > <gael.jourdan-w...@kelkoogroup.com> wrote: > > > > Ok, I get your point. > > > > > > Do you know if there is a tool to easily view filterCache content? > > > > I know we can see the top entries in the API or the UI but could we see > > more? > > > > > > Regards, > > > > Gaël > > > > ________________________________ > > De : Erick Erickson <erickerick...@gmail.com> > > Envoyé : mardi 15 janvier 2019 19:46:19 > > À : solr-user > > Objet : Re: Re: Delayed/waiting requests > > > > bq. If I get your point, having a big cache might cause more troubles > > than help if the cache hit ratio is not high enough because the cache > > is constantly evicting/inserting entries? > > > > Pretty much. Although there are nuances. > > > > Right now, you have a 12K autowarm count. That means your cache will > > eventually always contain 12K entries whether or not you ever use the > > last 11K! I'm simplifying a bit, but it grows like this. > > > > Let's say I start Solr. Initially it has no cache entries. Now I start > > both querying and indexing. For simplicity, say I have 100 _new_ fq > > clauses come in between each commit. The first commit will autowarm > > 100. The next will autowarm 200, then 300...... etc. Eventually this > > will grow to 12K. So your performance will start to vary depending on > > how long Solr has been running. > > > > Worse. it's not clear that you _ever_ re-use those clauses. One example: > > fq=date_field:[* TO NOW] > > NOW is really a Unix timestamp. So issuing the same fq 1 millisecond > > from the first one will not re-use the entry. In the worst case almost > > all of your autwarming is useless. It neither loads relevant index > > data into RAM nor is reusable. > > > > Even if you use "date math" to round to, say, a minute, if you run > > Solr long enough you'll still fill up with useless fq clauses. > > > > Best, > > Erick > > > > On Tue, Jan 15, 2019 at 9:33 AM Gael Jourdan-Weil > > <gael.jourdan-w...@kelkoogroup.com> wrote: > > > > > > @Erick: > > > > > > > > > We will try to lower the autowarm and run some tests to compare. > > > > > > If I get your point, having a big cache might cause more troubles than > > > help if the cache hit ratio is not high enough because the cache is > > > constantly evicting/inserting entries? > > > > > > > > > > > > @Jeremy: > > > > > > > > > Index size: ~20G and ~14M documents > > > > > > Server memory available: 256G from which ~30G used and ~100G system cache > > > > > > Server CPU count: 32, ~10% usage > > > > > > JVM memory settings: -Xms12G -Xmx12G > > > > > > > > > We have 3 servers and 3 clusters of 3 Solr instances. > > > > > > That is each server hosts 1 Solr instance for each cluster. > > > > > > And, indeed, each cluster only has 1 shard with replication factor 3. > > > > > > > > > Among all these Solr instances, the pauses are observed on only one > > > single cluster but on every server at different times (sometimes on all > > > servers at the same time but I would say it's very rare). > > > > > > We do observe the traffic is evenly balanced across the 3 servers, around > > > 30-40 queries per second sent to each server. > > > > > > > > > > > > Regards, > > > > > > Gaël > > > > > > > > > ________________________________ > > > De : Branham, Jeremy (Experis) <jb...@allstate.com> > > > Envoyé : mardi 15 janvier 2019 17:59:56 > > > À : solr-user@lucene.apache.org > > > Objet : Re: Re: Delayed/waiting requests > > > > > > Hi Gael – > > > > > > Could you share this information? > > > Size of the index > > > Server memory available > > > Server CPU count > > > JVM memory settings > > > > > > You mentioned a cloud configuration of 3 replicas. > > > Does that mean you have 1 shard with a replication factor of 3? > > > Do the pauses occur on all 3 servers? > > > Is the traffic evenly balanced across those servers? > > > > > > > > > Jeremy Branham > > > jb...@allstate.com > > > > > > > > > On 1/15/19, 9:50 AM, "Erick Erickson" <erickerick...@gmail.com> wrote: > > > > > > Well, it was a nice theory anyway. > > > > > > "Other collections with the same settings" > > > doesn't really mean much unless those other collections are very > > > similar, > > > especially in terms of numbers of docs. > > > > > > You should only see a new searcher opening when you do a > > > hard-commit-with-opensearcher-true or soft commit. > > > > > > So what happens when you just try lowering the autowarm > > > count? I'm assuming you're free to test in some non-prod > > > system. > > > > > > Focusing on the hit ratio is something of a red herring. Remember > > > that each entry in your filterCache is roughly maxDoc/8 + a little > > > overhead, the increase in GC pressure has to be balanced > > > against getting the hits from the cache. > > > > > > Now, all that said if there's no correlation, then you need to put > > > a profiler on the system when you see this kind of thing and > > > find out where the hotspots are, otherwise it's guesswork and > > > I'm out of ideas. > > > > > > Best, > > > Erick > > > > > > On Tue, Jan 15, 2019 at 12:06 AM Gael Jourdan-Weil > > > <gael.jourdan-w...@kelkoogroup.com> wrote: > > > > > > > > Hi Erick, > > > > > > > > > > > > Thank you for your detailed answer, I better understand autowarming. > > > > > > > > > > > > We have an autowarming time of ~10s for filterCache > > > (queryResultCache is not used at all, ratio = 0.02). > > > > > > > > We increased the size of the filterCache from 6k to 12k (and > > > autowarming size set to same values) to have a better ratio which is > > > _only_ around 0.85/0.90. > > > > > > > > > > > > The thing I don't understand is I should see "Opening new searcher" > > > in the logs everytime a new searcher is opened and thus an autowarming > > > happens, right? > > > > > > > > But I don't see "Opening new searcher" very often, and I don't see > > > it being correlated with the response time peaks. > > > > > > > > > > > > Also, I didn't mention it earlier but, we have other SolrCloud > > > clusters with similar settings and load (~10s filterCache autowarming, > > > 10k entries) and we don't observe the same behavior. > > > > > > > > > > > > Regards, > > > > > > > > ________________________________ > > > > De : Erick Erickson <erickerick...@gmail.com> > > > > Envoyé : lundi 14 janvier 2019 17:44:38 > > > > À : solr-user > > > > Objet : Re: Delayed/waiting requests > > > > > > > > Gael: > > > > > > > > bq. Nevertheless, our filterCache is set to autowarm 12k entries > > > which > > > > is also the maxSize > > > > > > > > That is far, far, far too many. Let's assume you actually have 12K > > > > entries in the filterCache. > > > > Every time you open a new searcher, 12K queries are executed > > > _before_ > > > > the searcher > > > > accepts any new requests. While being able to re-use a filterCache > > > > entry is useful, one of > > > > the primary purposes is to pre-load index data from disk into memory > > > > which can be > > > > the event that takes the most time. > > > > > > > > The queryResultCache has a similar function. I often find that this > > > > cache doesn't have a > > > > very high hit ratio, but again executing a _few_ of these queries > > > > warms the index from > > > > disk. > > > > > > > > I think of both caches as a map, where the key is the "thing", (fq > > > > clause in the case > > > > of filterCache, the whole query in the case of the > > > queryResultCache). > > > > Autowarming > > > > replays the most recently executed N of these entries, essentially > > > > just as though > > > > they were submitted by a user. > > > > > > > > Hypothesis: You're massively over-warming, and when that kicks in > > > you're seeing > > > > increased CPU and GC pressure leading to the anomalies you're > > > seeing. Further, > > > > you have such excessive autowarming going on that it's hard to see > > > the > > > > associated messages in the log. > > > > > > > > Here's what I'd recommend: Set your autowarm counts to something on > > > the order > > > > of 16. If the culprit is just excessive autowarming, I'd expect > > > your spikes to > > > > be much less severe. It _might_ be that your users see some > > > increased (very > > > > temporary) variance in response time. You can tell that the > > > autowarming > > > > configurations are "more art than science", I can't give you any > > > other > > > > recommendations than "start small and increase until you're happy" > > > > unfortunately. > > > > > > > > I usually do this with some kind of load tester in a dev lab of > > > course ;). > > > > > > > > Finally, if you use the metrics data (see: > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F1_metrics-2Dreporting.html&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=9BWTVr32mplsfAWQ3hnWuVx5V1cL_RgLNDDpg8S2mtk&e=) > > > > you can see the autowarm times. Don't get too lost in the page to > > > > start, just hit the > > > "https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_metrics&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=F8ahrx-8r_hWi7F1M0fHZc-_gEqsmr-yPvSZlGYX5rw&e=" > > > endpoint > > > > and look for "warmupTime", then refine on how to get _only_ > > > > the warmup stats ;). > > > > > > > > Best, > > > > Erick > > > > > > > > On Mon, Jan 14, 2019 at 5:08 AM Gael Jourdan-Weil > > > > <gael.jourdan-w...@kelkoogroup.com> wrote: > > > > > > > > > > I had a look to GC logs this morning but I'm not sure how to > > > interpret them. > > > > > > > > > > > > > > > Over a period of 54mn, there is: > > > > > > > > > > - Number of pauses: 2739 > > > > > > > > > > - Accumulated pauses: 93s => that is 2.86% of the time > > > > > > > > > > - Average pause duration: 0.03s > > > > > > > > > > - Average pause interval: 1.18s > > > > > > > > > > - Accumulated full GC: 0 > > > > > > > > > > I'm not sure if this is a lot or not. What do you think ? > > > > > > > > > > > > > > > Looking more closely to GC logs with GC Viewer, I can notice that > > > the high response time peaks happens at the same time where GC pauses > > > takes 2x more time (around 0.06s) than average. > > > > > > > > > > > > > > > Also we are indeed indexing at the same time but we have > > > autowarming set. > > > > > > > > > > I don't see any Searcher opened at the time we experience > > > slowness. > > > > > > > > > > Nevertheless, our filterCache is set to autowarm 12k entries > > > which is also the maxSize. > > > > > > > > > > Could this have any downside? > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Gaël > > > > > > > > > > > > > > > ________________________________ > > > > > De : Erick Erickson <erickerick...@gmail.com> > > > > > Envoyé : vendredi 11 janvier 2019 17:21 > > > > > À : solr-user > > > > > Objet : Re: Delayed/waiting requests > > > > > > > > > > Jimi's comment is one of the very common culprits. > > > > > > > > > > Autowarming is another. Are you indexing at the same > > > > > time? If so it could well be you aren't autowarming and > > > > > the spikes are caused by using a new IndexSearcher > > > > > that has to read much of the index off disk when commits > > > > > happen. The "smoking gun" here would be if the spikes > > > > > correlate to your commits (soft or hard-with-opensearcher-true). > > > > > > > > > > Best, > > > > > Erick > > > > > > > > > > On Fri, Jan 11, 2019 at 1:23 AM Gael Jourdan-Weil > > > > > <gael.jourdan-w...@kelkoogroup.com> wrote: > > > > > > > > > > > > Interesting indeed, we did not see anything with VisualVM but > > > having a look at the GC logs could gives us more info, especially on the > > > pauses. > > > > > > > > > > > > I will collect data over the week-end and look at it. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > ________________________________ > > > > > > De : Hullegård, Jimi <jimi.hulleg...@svensktnaringsliv.se> > > > > > > Envoyé : vendredi 11 janvier 2019 03:46:02 > > > > > > À : solr-user@lucene.apache.org > > > > > > Objet : Re: Delayed/waiting requests > > > > > > > > > > > > Could be caused by garbage collection in the jvm. > > > > > > > > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.apache.org_solr_SolrPerformanceProblems&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=csud3Q2nMknKdqj5WMFHMA7mK8nDBNlxAmbDafyp2us&e= > > > > > > > > > > > > Go down to the segment called “GC pause problems” > > > > > > > > > > > > /Jimi > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > On 11 Jan 2019, at 05:05, Gael Jourdan-Weil > > > <gael.jourdan-w...@kelkoogroup.com<mailto:gael.jourdan-w...@kelkoogroup.com>> > > > wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > We are experiencing some performance issues on a simple > > > SolrCloud cluster of 3 replicas (1 core) but what we found during our > > > analysis seems a bit odd, so we thought the community could have relevant > > > ideas on this. > > > > > > > > > > > > Load: between 30 and 40 queries per second, constant over time > > > of analysis > > > > > > > > > > > > Symptoms: high response time over short period of time but > > > quite frequently. > > > > > > We are talking about requests response time going from 50ms to > > > 5000ms or even worse during less than 5 seconds, and then going back to > > > normal. > > > > > > > > > > > > What we found out: just before response time increase, requests > > > seems to be delayed. > > > > > > That is during 2/3 seconds, requests pile up, no response is > > > sent, and then all requests are resolved and responses are all returned > > > to the clients at the same time. > > > > > > Very much like if there was a lock happening somewhere. But we > > > found no "lock" time nor at JVM or system level. > > > > > > > > > > > > Does someone can think of something that could explain this in > > > the way Solr works ? > > > > > > Or ideas to track down the root cause.. > > > > > > > > > > > > Solr version is 7.2.1. > > > > > > > > > > > > Thanks for reading, > > > > > > > > > > > > Gaël Jourdan-Weil > > > > > > > > > > > > Svenskt Näringsliv behandlar dina personuppgifter i enlighet > > > med GDPR. Här kan du läsa mer om vår behandling och dina rättigheter, > > > Integritetspolicy<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.svensktnaringsliv.se_dataskydd_integritet-2Doch-2Dbehandling-2Dav-2Dpersonuppgifter-5F697219.html-3Futm-5Fsource-3Dsn-2Demail-26utm-5Fmedium-3Demail&d=DwIFaQ&c=gtIjdLs6LnStUpy9cTOW9w&r=0SwsmPELGv6GC1_5JSQ9T7ZPMLljrIkbF_2jBCrKXI0&m=h6jTb9n4NnmdKzYWrvtmR4Hx9AKJvlxPH538vyXpE30&s=YaT3hPS_GQ4_rKBEfndYZC583WM6K6boCdbEJKTQirA&e=> > > > > > > >