Re: Speculation on Memory needed to efficently run a Solr Instance.

Jack Krupansky Fri, 15 Jan 2016 08:31:23 -0800

Personally, I'll continue to recommend that the ideal goal is to fully
cache the entire Lucene index in system memory, as well as doing a proof of
concept implementation to validate actual performance for your actual data.
You can do a POC with a small fraction of your full data, like 15% or even
10%, and then it's fairly safe to simply multiple those numbers to get the
RAM needed for the full 100% of your data (or even 120% to allow for modest
growth.)


Be careful about distinguishing search and query - sure, only a subset of
the data is needed to find the matching documents, but then the stored data
must be fetched to return the query results (search/lookup vs. query
results.) If the stored values are not also cached, you will increase the
latency of your overall query (returning results) even if the
search/match/lookup was reasonably fast.

So, the model is to prototype with a measured subset of your data, see how
the latency and system memory usage work out, and then scale that number up
for total memory requirement.

Again to be clear, if you really do need the best/minimal overall query
latency, your best bet is to have sufficient system memory to fully cache
the entire index. If you actually don't need minimal latency, then of
course you can feel free to trade off RAM for lower latency.



-- Jack Krupansky

On Fri, Jan 15, 2016 at 4:43 AM, Gian Maria Ricci - aka Alkampfer <
alkamp...@nablasoft.com> wrote:

> Hi,
>
>
>
> When it is time to calculate how much RAM a solr instance needs to run
> with good performance, I know that it is some form of art, but I’m looking
> at a general “formula” to have at least one good starting point.
>
>
>
> Apart the RAM devoted to Java HEAP, that is strongly dependant on how I
> configure caches, and the distribution of queries in my system, I’m
> particularly interested in the amount of RAM to leave to operating system
> to use File Cache.
>
>
>
> Suppose I have an index of 51 Gb of dimension, clearly having that amount
> of ram devoted to the OS is the best approach, so all index files can be
> cached into memory by the OS, thus I can achieve maximum speed.
>
>
>
> But if I look at the detail of the index, in this particular example I see
> that the bigger file has .fdt extension, it is the stored field for the
> documents, so it affects retrieval of document data, not the real search
> process. Since this file is 24 GB of size, it is almost half of the space
> of the index.
>
>
>
> My question is: it could be safe to assume that a good starting point for
> the amount of RAM to leave to the OS is the dimension of the index less the
> dimension of the .fdt file because it has less importance in the search
> process?
>
>
>
> Are there any particular setting at OS level (CentOS linux) to have
> maximum benefit from OS file cache? (documentation at
> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production#TakingSolrtoProduction-MemoryandGCSettings
> does not have any information related to OS configuration). Elasticsearch (
> https://www.elastic.co/guide/en/elasticsearch/reference/1.4/setup-configuration.html)
> generally have some suggestions such as using mlockall, disable swap etc
> etc, I wonder if there are similar suggestions for solr.
>
>
>
> Many thanks for all the great help you are giving me in this mailing list.
>
>
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
> [image:
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZkVVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d-e1-ft#http://www.codewrecks.com/files/signature/mvp.png]
> <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635> [image:
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0-d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg]
> <http://www.linkedin.com/in/gianmariaricci> [image:
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg]
> <https://twitter.com/alkampfer> [image:
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/signature/rss.jpg]
> <http://feeds.feedburner.com/AlkampferEng> [image:
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg]
>
>
>

Re: Speculation on Memory needed to efficently run a Solr Instance.

Reply via email to