John Nielsen [j...@mcb.dk] wrote:
> I managed to get this done. The facet queries now facets on a multivalue 
> field as opposed to the dynamic field names.

> Unfortunately it doesn't seem to have done much difference, if any at all.

I am sorry to hear that.

> documents = ~1.400.000
> references 11.200.000  (we facet on two multivalue fields with each 4 values 
> on average, so 1.400.000 * 2 * 4 = 11.200.000
> unique values = 1.132.344 (total number of variant options across all clients.
> This is what we facet on)

> 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field 
> (we have 4 fields)?

> I must be calculating this wrong.

No, that sounds about right. In reality you need to multiply with 3 or 4, so 
let's round to 50MB/field: 1.4M documents with 2 fields with 5M 
references/field each is not very much and should not take a lot of memory. In 
comparison, we facet on 12M documents with 166M references and do some other 
stuff (in Lucene with a different faceting implementation, but at this level it 
is equivalent to Solr's in terms of memory). Our heap is 3GB.

I am surprised about the lack of "UnInverted" from your logs as it is logged on 
INFO level. It should also be available from the admin interface under 
collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing you got your 
numbers from that and that the list only contains the few facets you mentioned 
previously? It might be wise to sanity check by summing the memSizes though; 
they ought to take up far below 1GB.

>From your description, your index is small and your faceting requirements 
>modest. A SSD-equipped laptop should be adequate as server. So we are back to 
>"math does not check out".


You stated that you were unable to make a 4GB JVM OOM when you just performed 
faceting (I guesstimate that it will also run fine with just ½GB or at least 
with 1GB, based on the numbers above) and you have observed that the field 
cache eats the memory. This does indicate that the old caches are somehow not 
freed when the index is updated. That is strange as Solr should take care of 
that automatically.

Guessing wildly: Do you issue a high frequency small updates with frequent 
commits? If you pause the indexing, does memory use fall back to the single GB 
level (You probably need to trigger a full GC to check that)? If that is the 
case, it might be a warmup problem with old warmups still running when new 
commits are triggered.

Regards,
Toke Eskildsen, State and University Library, Denmark

Reply via email to