On 4/15/2019 7:25 AM, SOLR4189 wrote:
I have a collection with many shards. Each shard is in separate SOLR node
(VM) has 40Gb index size, 4 CPU and SSD.

When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
same queries times (percentile80, percentile90 and percentile95). I run the
long test - 8 hours production queries and updates.
What does it mean? All index in RAM it not must? Maybe is it due to SSD? How
can I check it?

Achieving good performance does not necessarily require that you have enough memory to cache the entire index.

The OS disk cache only caches data that is actually accessed. Running thousands of queries is going to access certain parts of the index frequently, but it is unlikely to actually access ALL of the data in the index.

The most important part of the index that will be accessed on every query is the data produced by the schema attribute 'indexed="true"'. That's the actual inverted index. The percentage of the full index that this part consumes will be highly dependent on your schema and the actual contents of the documents that you index -- I cannot give you a percentage. Some setups need half the index cached. Some need a lot more. I've heard of some people having great performance with only ten percent of the index cached, but I suspect that this is not common.

If you go to this page, click on the "Asking for help on a memory/performance issue" link in the table of contents, and look at the screenshots, you'll see a lot of numbers:

https://wiki.apache.org/solr/SolrPerformanceProblems

An important number for you to check on your systems is labeled "cached Mem" in the Linux/UNIX screenshot, showing about 18GB, and "Cached" in the Windows screenshot, showing about 8GB. This is the actual amount of data in the OS disk cache. If Solr is the only thing on the system, then it should be pretty close to the amount of index data that the system has cached. You'll probably find that on the 50GB system that only a fraction of the available memory has actually been used. You may even find that the same is true on the smaller system.

The OS disk cache can only contain data that has actually been read. If a part of the index data is never accessed by queries, it will not be in the OS disk cache.

Thanks,
Shawn

Reply via email to