Reply:Reply:Re: How distributed queries works?

suonayi2006 Tue, 22 Jan 2013 10:04:38 -0800

Can anyone help me out of the confusion?many thanks.


在 2013-01-22 23:20:12，SuoNayi <suonayi2...@163.com> 写道：
>Hi,Erick,thanks for your detailed explanation.
>
>The collecting shard combines the 24060 ID/score pairs into a master list and 
>then
>how it to choose the right  20 docs? It depends on what conditions?
>I assume the collecting shard sort these docs by score and the top 20 docs 
>with higher
>scores are chosen.If these docs have the same scores and how to decide their 
>order? 
>If depending on the collecting order then each same query may see the 
>different docs 
>on the same page, doesnot it?
>Furthermore,If I want to my query to sort by other field rather than the 
>default score
>what content the other nodes will send to the collecting shard?
>
>
>
>
>Thanks,
>SuoNayi
>
>
>
>
>
>At 2013-01-22 19:57:32,"Erick Erickson" <erickerick...@gmail.com> wrote:
>>bq: does Solr need load all the docs into RAM to calculate score and order
>>
>>You're very close. The query (and this is just like 3.x) is sent to
>>each shard. Let's say your page size is 20 (the &rows=20)
>>
>>Each node will need to keep a list of 8020 documents (400 * 20) + 20,
>>really the ID and score, collect all these and send just the ID and
>>score back to the collecting shard. At that point, the collecting
>>shard combines the 24060 ID/score pairs into a master list and picks
>>the right 20 (8000 - 8020 in the combined list) docs and then asks
>>each shard for the portion of that 20 that were resident on them.
>>
>>"Deep paging" over a sharded situation is pretty expensive, Solr is
>>optimized for returning the top N docs where N is usually pretty
>>small...
>>
>>One minor nit. Solr doesn't load docs into RAM to calculate score,
>>just peruses the index="true" data to calculate score. All that stays
>>in RAM is the doc ID and score _until_ the document contents are
>>assembled, i.e. the raw data is only assembled for &rows docs and then
>>only at the very end...
>>
>>Best
>>Erick
>>
>>On Tue, Jan 22, 2013 at 2:47 AM, SuoNayi <suonayi2...@163.com> wrote:
>>> Dear list,
>>> I want to know the internal mechanism for the distributed queries of 
>>> SolrCloud.
>>> AFAIK,distributed query is supported before the presence of SolrCloud, 
>>> users can
>>> specify shard urls in the query parameters. We can distribute data by time 
>>> interval
>>> in this case.It's called horizontal scalability based on history?
>>> Now SolrCloud do further more because it can discover the other shards(Solr 
>>> instance)
>>> via ZooKeeper and distribute data based on Hash & Mod  of the unique key of 
>>> the doc.
>>> For both cases the requested Solr instance need do scatter queries across 
>>> the shards
>>> and gather the result at last.This process seems like Map-Reduce.
>>> Buy what happens when scattering and gathering? I have read the WIKI but no 
>>> more
>>> details available.I really hope someone can make me clear and give some 
>>> links.
>>>
>>>
>>> Supposing there are 3 shards and 0 replica in my Solr cloud, each shard 
>>> have 150
>>> millions docs.My client query by q=*:* and outputs the results page by 
>>> page.When
>>> the page number is very large,saying 400th page, does Solr need load all 
>>> the docs into
>>> RAM to calculate score and order?
>>>
>>>
>>> Sorry for newbie question and thanks for your time.
>>>
>>>
>>>
>>>
>>> Thanks
>>> SuoNayi
>>>
>>>
>>>
>>>

Reply:Reply:Re: How distributed queries works?

Reply via email to