Re: Speculation on Memory needed to efficently run a Solr Instance.

Erick Erickson Fri, 15 Jan 2016 08:27:53 -0800

And to make matters worse, much worse (actually, better)...

See: https://issues.apache.org/jira/browse/SOLR-8220


That ticket (and there will be related ones) is about returning
data from DocValues fields rather than from the stored data
in some situations. Which means it will soon (I hope) be
entirely possible to not have an .fdt file at all. There are some
caveats to that approach, but it can completely bypass the
read-from-disk, decompress, return the data process.

Do note, however, that you can't have analyzed text be docValues
so this will be suitable only for string, numerics and the like fields.

Best,
Erick

On Fri, Jan 15, 2016 at 2:56 AM, Gian Maria Ricci - aka Alkampfer
<alkamp...@nablasoft.com> wrote:
> THanks a lot I'll have a look to Sematext SPM.
>
> Actually the index is not static, but the number of new documents will be
> small and probably they will be indexed during the night, so I'm not
> expecting too much problem from merge factor. We can index new document
> during the night and then optimize the index. (during night there are no
> searches).
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>
>
>
> -----Original Message-----
> From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com]
> Sent: venerdì 15 gennaio 2016 11:06
> To: solr-user@lucene.apache.org
> Subject: Re: Speculation on Memory needed to efficently run a Solr Instance.
>
> Hi,
> OS does not care much about search v.s. retrieve so amount of RAM needed for
> file caches would depend on your index usage patterns. If you are not
> retrieving stored fields much and most/all results are only
> id+score, than it can be assumed that you can go with less RAM than
> actual index size. In such case you can question if you need stored fields
> in index. Also if your index/usage pattern is such that only small subset of
> documents is retrieved with stored fields, than it can also be assumed it
> will never need to cache entire fdt file.
> One thing that you forgot (unless you index is static) is segments merging -
> in worst case system will have two "copies" of index and having extra memory
> can help in such cases.
> The best approach is to use some tool and monitor IO and memory metrics.
> One such tool is Sematext's SPM (http://sematext.com/spm) where you can see
> metrics for both system and SOLR.
>
> Thanks,
> Emir
>
> On 15.01.2016 10:43, Gian Maria Ricci - aka Alkampfer wrote:
>>
>> Hi,
>>
>> When it is time to calculate how much RAM a solr instance needs to run
>> with good performance, I know that it is some form of art, but I’m
>> looking at a general “formula” to have at least one good starting point.
>>
>> Apart the RAM devoted to Java HEAP, that is strongly dependant on how
>> I configure caches, and the distribution of queries in my system, I’m
>> particularly interested in the amount of RAM to leave to operating
>> system to use File Cache.
>>
>> Suppose I have an index of 51 Gb of dimension, clearly having that
>> amount of ram devoted to the OS is the best approach, so all index
>> files can be cached into memory by the OS, thus I can achieve maximum
>> speed.
>>
>> But if I look at the detail of the index, in this particular example I
>> see that the bigger file has .fdt extension, it is the stored field
>> for the documents, so it affects retrieval of document data, not the
>> re
> al search process. Since this file is 24 GB of size, it is almost
>> half of the space of the index.
>>
>> My question is: it could be safe to assume that a good starting point
>> for the amount of RAM to leave to the OS is the dimension of the index
>> less the dimension of the .fdt file because it has less importance in
>> the search process?
>>
>> Are there any particular setting at OS level (CentOS linux) to have
>> maximum benefit from OS file cache? (documentation at
>> https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Produc
>> tion#TakingSolrtoProduction-MemoryandGCSettingsdoes
>> not have any information related to OS configuration). Elasticsearch
>> (https://www.elastic.co/guide/en/elasticsearch/reference/1.4/setup-con
>> figuration.html) generally have some suggestions such as using
>> mlockall, disable swap etc etc, I wonder if there are similar
>> suggestions for solr.
>>
>> Many thanks for all the great help you are giving me in this mailing
>> list.
>>
>> --
>> Gian Maria Ricci> Cell: +39 320 0136949
>>
>> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZk
>> VVhkPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=s0-d
>> -e1-ft#http://www.codewrecks.com/files/signature/mvp.png
>> <http://mvp.microsoft.com/en-us/mvp/Gian%20Maria%20Ricci-4025635>https
>> ://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrmGLUT3
>> fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=s0
>> -d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
>> <http://www.linkedin.com/in/gianmariaricci>https://ci3.googleuserconte
>> nt.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_GpcIZNEPjCzOYMrPl-r1DViPE378qN
>> AQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=s0-d-e1-ft#http://www.code
>> wrecks.com/files/signature/twitter.jpg
>> <https://twitter.com/alkampfer>https://ci5.googleusercontent.com/proxy
>> /iuDOD2sdaxRDvTwS8MO7-CcXchpNJX96uaWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo
>> 7yL9ZtdrYueEfH7jXnudmi_Vvw=s0-d-e1-ft#http://www.codewrecks.com/files/
>> signature/rss.jpg
>> <http://feeds.feedburner.com
> /AlkampferEng>https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86
> YXcwaKfn3x7ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg
> =s0-d-e1-ft#http://www.codewrecks.com/files/signature/skype.jpg
>>
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr
> & Elasticsearch Support * http://sematext.com/
>

Re: Speculation on Memory needed to efficently run a Solr Instance.

Reply via email to