As detailed below. The collection where we have issues have 16 shards with 2 replica each.
On Sun, May 10, 2020, 9:10 PM matthew sporleder <msporle...@gmail.com> wrote: > Why so many shards? > > > On May 10, 2020, at 9:09 PM, Ganesh Sethuraman <ganeshmail...@gmail.com> > wrote: > > > > We are using dedicated host, Cent OS in EC2 r5.12xlarge (48 CPU, > ~360GB > > RAM), 2 nodes. Swapiness set to 1. With General purpose 2T EBS SSD > volume. > > JVM size of 18gb, with G1 GC enabled. About 92 collection with average > of 8 > > shards and 2 replica each. Most of updates over daily batch updates. > > > > While we have Solr disk utilization of about ~800gb. Most of the > collection > > space are for real time GET, /get call. The issue we are having is for > few > > collection where we having query use case /need. This has 32 replica (16 > > shards 2 replica each). During performance test, issue is few calls where > > we have high response time, it is noticeable when test duration is small, > > the response time improve when the test is for longer duration. > > > > Hope this information helps. > > > > Regards > > Ganesh > > > > Regards > > Ganesh > > > > > >> On Sun, May 10, 2020, 8:14 PM Shawn Heisey <apa...@elyograg.org> wrote: > >> > >>> On 5/10/2020 4:48 PM, Ganesh Sethuraman wrote: > >>> The additional info is that when we execute the test for longer > (20mins) > >> we > >>> are seeing better response time, however for a short test (5mins) and > >> rerun > >>> the test after an hour or so we are seeing slow response times again. > >> Note > >>> that we don't update the collection during the test or in between the > >> test. > >>> Does this help to identify the issue? > >> > >> Assuming Solr is the only software that is running, most operating > >> systems would not remove Solr data from the disk cache, so unless you > >> have other software running on the machine, it's a little weird that > >> performance drops back down after waiting an hour. Windows is an > >> example of an OS that *does* proactively change data in the disk cache, > >> and on that OS, I would not be surprised by such behavior. You haven't > >> mentioned which OS you're running on. > >> > >>> 3. We have designed our test to mimick reality where filter cache is > not > >>> hit at all. From solr, we are seeing that there is ZERO Filter cache > hit. > >>> There is about 4% query and document cache hit in prod and we are > seeing > >> no > >>> filter cache hit in both QA and PROD > >> > >> If you're getting zero cache hits, you should disable the cache that is > >> getting zero hits. There is no reason to waste the memory that the > >> cache uses, because there is no benefit. > >> > >>> Give that, could this be some warming up related issue to keep the > Solr / > >>> Lucene memory-mapped file in RAM? Is there any way to measure which > >>> collection is using memory? we do have 350GB RAM, but we see it full > with > >>> buffer cache, not really sure what is really using this memory. > >> > >> You would have to ask the OS which files are contained by the OS disk > >> cache, and it's possible that even if the information is available, that > >> it is very difficult to get. There is no way Solr can report this. > >> > >> Thanks, > >> Shawn > >> >