On 5/10/2020 4:48 PM, Ganesh Sethuraman wrote:
The additional info is that when we execute the test for longer (20mins) we are seeing better response time, however for a short test (5mins) and rerun the test after an hour or so we are seeing slow response times again. Note that we don't update the collection during the test or in between the test. Does this help to identify the issue?
Assuming Solr is the only software that is running, most operating systems would not remove Solr data from the disk cache, so unless you have other software running on the machine, it's a little weird that performance drops back down after waiting an hour. Windows is an example of an OS that *does* proactively change data in the disk cache, and on that OS, I would not be surprised by such behavior. You haven't mentioned which OS you're running on.
3. We have designed our test to mimick reality where filter cache is not hit at all. From solr, we are seeing that there is ZERO Filter cache hit. There is about 4% query and document cache hit in prod and we are seeing no filter cache hit in both QA and PROD
If you're getting zero cache hits, you should disable the cache that is getting zero hits. There is no reason to waste the memory that the cache uses, because there is no benefit.
Give that, could this be some warming up related issue to keep the Solr / Lucene memory-mapped file in RAM? Is there any way to measure which collection is using memory? we do have 350GB RAM, but we see it full with buffer cache, not really sure what is really using this memory.
You would have to ask the OS which files are contained by the OS disk cache, and it's possible that even if the information is available, that it is very difficult to get. There is no way Solr can report this.
Thanks, Shawn