I did a search. I have no occurrence of "UnInverted" in the solr logs.

> Another explanation for the large amount of memory presents itself if
> you use a single index: If each of your clients facet on at least one
> fields specific to the client ("client123_persons" or something like
> that), then your memory usage goes through the roof.

This is exactly how we facet right now! I will definetely rewrite the
relevant parts of our product to test this out before moving further down
the docValues path.

I will let you know as soon as I know one way or the other.


On Mon, Apr 15, 2013 at 1:38 PM, Toke Eskildsen <t...@statsbiblioteket.dk>wrote:

> On Mon, 2013-04-15 at 10:25 +0200, John Nielsen wrote:
>
> > The FieldCache is the big culprit. We do a huge amount of faceting so
> > it seems right.
>
> Yes, you wrote that earlier. The mystery is that the math does not check
> out with the description you have given us.
>
> > Unfortunately I am super swamped at work so I have precious little
> > time to work on this, which is what explains my silence.
>
> No problem, we've all been there.
> >
> [Band aid: More memory]
>
> > The extra memory helped a lot, but it still OOM with about 180 clients
> > using it.
>
> You stated earlier that you has a "solr cluster" and your total(?) index
> size was 35GB, with each "register" being between "15k" and "30k". I am
> using the quotes to signify that it is unclear what you mean. Is your
> cluster multiple machines (I'm guessing no), multiple Solr's, cores,
> shards or maybe just a single instance prepared for later distribution?
> Is a register a core, shard or a simply logical part (one client's data)
> of the index?
>
> If each client has their own core or shard, that would mean that each
> client uses more than 25GB/180 bytes ~= 142MB of heap to access 35GB/180
> ~= 200MB of index. That sounds quite high and you would need a very
> heavy facet to reach that.
>
> If you could grep "UnInverted" from the Solr log file and paste the
> entries here, that would help to clarify things.
>
>
> Another explanation for the large amount of memory presents itself if
> you use a single index: If each of your clients facet on at least one
> fields specific to the client ("client123_persons" or something like
> that), then your memory usage goes through the roof.
>
> Assuming an index with 10M documents, each with 5 references to a modest
> 10K unique values in a facet field, the simplified formula
>   #documents*log2(#references) + #references*log2(#unique_values) bit
> tells us that this takes at least 110MB with field cache based faceting.
>
> 180 clients @ 110MB ~= 20GB. As that is a theoretical low, we can at
> least double that. This fits neatly with your new heap of 64GB.
>
>
> If my guessing is correct, you can solve your memory problems very
> easily by sharing _all_ the facet fields between your clients.
> This should bring your memory usage down to a few GB.
>
> You are probably already restricting their searches to their own data by
> filtering, so this should not influence the returned facet values and
> counts, as compared to separate fields.
>
> This is very similar to the thread "Facets with 5000 facet fields" BTW.
>
> > Today I finally managed to set up a test core so I can begin to play
> > around with docValues.
>
> If you are using a single index with the individual-facet-fields for
> each client approach, the DocValues will also have scaling issues, as
> the amount of values (of which the majority will be null) will be
>   #clients*#documents*#facet_fields
> This means that the adding a new client will be progressively more
> expensive.
>
> On the other hand, if you use a lot of small shards, DocValues should
> work for you.
>
> Regards,
> Toke Eskildsen
>
>
>


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk

Reply via email to