Re: Speculation on Memory needed to efficently run a Solr Instance.

Emir Arnautovic Fri, 15 Jan 2016 02:06:32 -0800

Hi,

OS does not care much about search v.s. retrieve so amount of RAM neededfor file caches would depend on your index usage patterns. If you arenot retrieving stored fields much and most/all results are onlyid+score, than it can be assumed that you can go with less RAM thanactual index size. In such case you can question if you need storedfields in index. Also if your index/usage pattern is such that onlysmall subset of documents is retrieved with stored fields, than it canalso be assumed it will never need to cache entire fdt file.One thing that you forgot (unless you index is static) is segmentsmerging - in worst case system will have two "copies" of index andhaving extra memory can help in such cases.The best approach is to use some tool and monitor IO and memory metrics.One such tool is Sematext's SPM (http://sematext.com/spm) where you cansee metrics for both system and SOLR.


Thanks,
Emir

On 15.01.2016 10:43, Gian Maria Ricci - aka Alkampfer wrote:

Hi,
When it is time to calculate how much RAM a solr instance needs to runwith good performance, I know that it is some form of art, but I’mlooking at a general “formula” to have at least one good starting point.
Apart the RAM devoted to Java HEAP, that is strongly dependant on howI configure caches, and the distribution of queries in my system, I’mparticularly interested in the amount of RAM to leave to operatingsystem to use File Cache.
Suppose I have an index of 51 Gb of dimension, clearly having thatamount of ram devoted to the OS is the best approach, so all indexfiles can be cached into memory by the OS, thus I can achieve maximumspeed.
But if I look at the detail of the index, in this particular example Isee that the bigger file has .fdt extension, it is the stored fieldfor the documents, so it affects retrieval of document data, not thereal search process. Since this file is 24 GB of size, it is almosthalf of the space of the index.
My question is: it could be safe to assume that a good starting pointfor the amount of RAM to leave to the OS is the dimension of the indexless the dimension of the .fdt file because it has less importance inthe search process?
Are there any particular setting at OS level (CentOS linux) to havemaximum benefit from OS file cache? (documentation athttps://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-MemoryandGCSettingsdoesnot have any information related to OS configuration). Elasticsearch(https://www.elastic.co/guide/en/elasticsearch/reference/1.4/setup-configuration.html)generally have some suggestions such as using mlockall, disable swapetc etc, I wonder if there are similar suggestions for solr.
Many thanks for all the great help you are giving me in this mailinglist.
--
Gian Maria Ricci
Cell: +39 320 0136949
https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png<http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg<http://www.linkedin.com/in/gianmariaricci>https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg<https://twitter.com/alkampfer>https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg<http://feeds.feedburner.com/AlkampferEng>https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

Re: Speculation on Memory needed to efficently run a Solr Instance.

Reply via email to