Hi,

I found on wiki (https://wiki.apache.org/solr/SolrPerformanceProblems#RAM) that 
optimal amount of RAM for SOLR is equal to index size. This is lets say the 
ideal case to have everything in memory.

We plan to have small installation with 2 nodes and 8shards. We'll have inside 
the cluster 100M of documents. We expect that each document will take 5kB to 
index. With in-memory index this would mean that those two nodes would require 
~500GB RAM. This would mean 2x 256GB to have everything in memory. And those 
are really big machines... Is this calculation even correct in new Solr 
versions?

And we do have a bit restricted problem: Our data are time based logs and we 
generally have a restricted search for last 3 months. Which will match let's 
say 10M of documents. How will this affect SOLR memory requirements? Will we 
still need to have the whole inverted indexes in memory? Or is there some 
internal optimization, which will ensure that only some part will need to be in 
memory?

The questions:

1)      Is the 500GB of memory reqs correct assumption?

2)      Will the fact that we have time-based logs with majority of accesses to 
recent data only help?

3)      Is there some best practice how to reduce required RAM in Solr?



Thanks in advance!

Pavel


Side note:
We were thinking about DB partitioning based on Time Routed Aliases, but 
unfortunately we need to ensure disaster recovery through a bad network 
connection. And TRA and Cross Data Center Replication are not compatible. (CDCR 
requires static number of cores, while TRA creates cores dynamically).

Reply via email to