I did a search. I have no occurrence of "UnInverted" in the solr logs.
> Another explanation for the large amount of memory presents itself if > you use a single index: If each of your clients facet on at least one > fields specific to the client ("client123_persons" or something like > that), then your memory usage goes through the roof. This is exactly how we facet right now! I will definetely rewrite the relevant parts of our product to test this out before moving further down the docValues path. I will let you know as soon as I know one way or the other. On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen <t...@statsbiblioteket.dk>wrote: > On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote: > > > The FieldCache is the big culprit. We do a huge amount of faceting so > > it seems right. > > Yes, you wrote that earlier. The mystery is that the math does not check > out with the description you have given us. > > > Unfortunately I am super swamped at work so I have precious little > > time to work on this, which is what explains my silence. > > No problem, we've all been there. > > > [Band aid: More memory] > > > The extra memory helped a lot, but it still OOM with about 180 clients > > using it. > > You stated earlier that you has a "solr cluster" and your total(?) index > size was 35GB, with each "register" being between "15k" and "30k". I am > using the quotes to signify that it is unclear what you mean. Is your > cluster multiple machines (I'm guessing no), multiple Solr's, cores, > shards or maybe just a single instance prepared for later distribution? > Is a register a core, shard or a simply logical part (one client's data) > of the index? > > If each client has their own core or shard, that would mean that each > client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180 > ~= 200MB of index. That sounds quite high and you would need a very > heavy facet to reach that. > > If you could grep "UnInverted" from the Solr log file and paste the > entries here, that would help to clarify things. > > > Another explanation for the large amount of memory presents itself if > you use a single index: If each of your clients facet on at least one > fields specific to the client ("client123_persons" or something like > that), then your memory usage goes through the roof. > > Assuming an index with 10M documents, each with 5 references to a modest > 10K unique values in a facet field, the simplified formula > #documents*log2(#references) + #references*log2(#unique_values) bit > tells us that this takes at least 110MB with field cache based faceting. > > 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at > least double that. This fits neatly with your new heap of 64GB. > > > If my guessing is correct, you can solve your memory problems very > easily by sharing _all_ the facet fields between your clients. > This should bring your memory usage down to a few GB. > > You are probably already restricting their searches to their own data by > filtering, so this should not influence the returned facet values and > counts, as compared to separate fields. > > This is very similar to the thread "Facets with 5000 facet fields" BTW. > > > Today I finally managed to set up a test core so I can begin to play > > around with docValues. > > If you are using a single index with the individual-facet-fields for > each client approach, the DocValues will also have scaling issues, as > the amount of values (of which the majority will be null) will be > #clients*#documents*#facet_fields > This means that the adding a new client will be progressively more > expensive. > > On the other hand, if you use a lot of small shards, DocValues should > work for you. > > Regards, > Toke Eskildsen > > > -- Med venlig hilsen / Best regards *John Nielsen* Programmer *MCB A/S* Enghaven 15 DK-7500 Holstebro Kundeservice: +45 9610 2824 p...@mcb.dk www.mcb.dk