Hi,
OS does not care much about search v.s. retrieve so amount of RAM needed
for file caches would depend on your index usage patterns. If you are
not retrieving stored fields much and most/all results are only
id+score, than it can be assumed that you can go with less RAM than
actual index size. In such case you can question if you need stored
fields in index. Also if your index/usage pattern is such that only
small subset of documents is retrieved with stored fields, than it can
also be assumed it will never need to cache entire fdt file.
One thing that you forgot (unless you index is static) is segments
merging - in worst case system will have two "copies" of index and
having extra memory can help in such cases.
The best approach is to use some tool and monitor IO and memory metrics.
One such tool is Sematext's SPM (http://sematext.com/spm) where you can
see metrics for both system and SOLR.
Thanks,
Emir
On 15.01.2016 10:43, Gian Maria Ricci - aka Alkampfer wrote:
Hi,
When it is time to calculate how much RAM a solr instance needs to run
with good performance, I know that it is some form of art, but I’m
looking at a general “formula” to have at least one good starting point.
Apart the RAM devoted to Java HEAP, that is strongly dependant on how
I configure caches, and the distribution of queries in my system, I’m
particularly interested in the amount of RAM to leave to operating
system to use File Cache.
Suppose I have an index of 51 Gb of dimension, clearly having that
amount of ram devoted to the OS is the best approach, so all index
files can be cached into memory by the OS, thus I can achieve maximum
speed.
But if I look at the detail of the index, in this particular example I
see that the bigger file has .fdt extension, it is the stored field
for the documents, so it affects retrieval of document data, not the
real search process. Since this file is 24 GB of size, it is almost
half of the space of the index.
My question is: it could be safe to assume that a good starting point
for the amount of RAM to leave to the OS is the dimension of the index
less the dimension of the .fdt file because it has less importance in
the search process?
Are there any particular setting at OS level (CentOS linux) to have
maximum benefit from OS file cache? (documentation at
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-MemoryandGCSettingsdoes
not have any information related to OS configuration). Elasticsearch
(https://www.elastic.co/guide/en/elasticsearch/reference/1.4/setup-configuration.html)
generally have some suggestions such as using mlockall, disable swap
etc etc, I wonder if there are similar suggestions for solr.
Many thanks for all the great help you are giving me in this mailing
list.
--
Gian Maria Ricci
Cell: +39 320 0136949
https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png
<http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
<http://www.linkedin.com/in/gianmariaricci>https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg
<https://twitter.com/alkampfer>https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg
<http://feeds.feedburner.com/AlkampferEng>https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/