adfel70 <adfe...@gmail.com> wrote:
> Hi Toke, Thank you for the detailed explanation, thats exactly what I was
> looking for, except this algorithm fit single index only. could you please
> elaborate what adjustments are needed for distributed index?

Vanilla Solr requests top-X terms from each shard, with over-provisioning. I do 
not remember the exact formula (and I think it is adjustable in Solr 5), but 
something like X*1.5+10? Yes, that means that correctness is not guaranteed for 
distributed faceting. It would be possible to make some sort of streaming 
faceting implementation, but the pathological case is that all shards must 
deliver all terms to derive the correct top-X.

The results from the shards are merged and the top-X terms are fine-counted 
where needed: If we have 3 shards and asked for top-1, they might answer 
shard1: [foo(3), zoo(1)]
shard2: [foo(1), zoo(1)]
shard3: [bar(2),aar(2)]
(remember the over-provisioning). We derive that foo is the top-1 term, but 
since shard 3 did not provide a count for foo, we need to ask shard3 for the 
count for that specific term to get the correct overall count. 

The fine-counting is performed differently from standard faceting. It is 
basically 'original_query AND facet_field:fine_count_term'. Quite fast for a 
few terms, but if there is a need for resolving tens or hundreds of terms for a 
non-trivial index, the fine-counting phase can take longer than the initial 
faceting phase.

- Toke Eskildsen
(sorry for the delayed answer - my email reader hid your response)

Reply via email to