John Nielsen [j...@mcb.dk] wrote: > I managed to get this done. The facet queries now facets on a multivalue > field as opposed to the dynamic field names.
> Unfortunately it doesn't seem to have done much difference, if any at all. I am sorry to hear that. > documents = ~1.400.000 > references 11.200.000 (we facet on two multivalue fields with each 4 values > on average, so 1.400.000 * 2 * 4 = 11.200.000 > unique values = 1.132.344 (total number of variant options across all clients. > This is what we facet on) > 1.400.000 * log2(11.200.000) + 1.400.000 * log2(1132344) = ~14MB per field > (we have 4 fields)? > I must be calculating this wrong. No, that sounds about right. In reality you need to multiply with 3 or 4, so let's round to 50MB/field: 1.4M documents with 2 fields with 5M references/field each is not very much and should not take a lot of memory. In comparison, we facet on 12M documents with 166M references and do some other stuff (in Lucene with a different faceting implementation, but at this level it is equivalent to Solr's in terms of memory). Our heap is 3GB. I am surprised about the lack of "UnInverted" from your logs as it is logged on INFO level. It should also be available from the admin interface under collection/Plugin / Stats/CACHE/fieldValueCache. But I am guessing you got your numbers from that and that the list only contains the few facets you mentioned previously? It might be wise to sanity check by summing the memSizes though; they ought to take up far below 1GB. >From your description, your index is small and your faceting requirements >modest. A SSD-equipped laptop should be adequate as server. So we are back to >"math does not check out". You stated that you were unable to make a 4GB JVM OOM when you just performed faceting (I guesstimate that it will also run fine with just ½GB or at least with 1GB, based on the numbers above) and you have observed that the field cache eats the memory. This does indicate that the old caches are somehow not freed when the index is updated. That is strange as Solr should take care of that automatically. Guessing wildly: Do you issue a high frequency small updates with frequent commits? If you pause the indexing, does memory use fall back to the single GB level (You probably need to trigger a full GC to check that)? If that is the case, it might be a warmup problem with old warmups still running when new commits are triggered. Regards, Toke Eskildsen, State and University Library, Denmark