Can anyone help me out of the confusion?many thanks.
在 2013-01-22 23:20:12,SuoNayi <suonayi2...@163.com> 写道: >Hi,Erick,thanks for your detailed explanation. > >The collecting shard combines the 24060 ID/score pairs into a master list and >then >how it to choose the right 20 docs? It depends on what conditions? >I assume the collecting shard sort these docs by score and the top 20 docs >with higher >scores are chosen.If these docs have the same scores and how to decide their >order? >If depending on the collecting order then each same query may see the >different docs >on the same page, doesnot it? >Furthermore,If I want to my query to sort by other field rather than the >default score >what content the other nodes will send to the collecting shard? > > > > >Thanks, >SuoNayi > > > > > >At 2013-01-22 19:57:32,"Erick Erickson" <erickerick...@gmail.com> wrote: >>bq: does Solr need load all the docs into RAM to calculate score and order >> >>You're very close. The query (and this is just like 3.x) is sent to >>each shard. Let's say your page size is 20 (the &rows=20) >> >>Each node will need to keep a list of 8020 documents (400 * 20) + 20, >>really the ID and score, collect all these and send just the ID and >>score back to the collecting shard. At that point, the collecting >>shard combines the 24060 ID/score pairs into a master list and picks >>the right 20 (8000 - 8020 in the combined list) docs and then asks >>each shard for the portion of that 20 that were resident on them. >> >>"Deep paging" over a sharded situation is pretty expensive, Solr is >>optimized for returning the top N docs where N is usually pretty >>small... >> >>One minor nit. Solr doesn't load docs into RAM to calculate score, >>just peruses the index="true" data to calculate score. All that stays >>in RAM is the doc ID and score _until_ the document contents are >>assembled, i.e. the raw data is only assembled for &rows docs and then >>only at the very end... >> >>Best >>Erick >> >>On Tue, Jan 22, 2013 at 2:47 AM, SuoNayi <suonayi2...@163.com> wrote: >>> Dear list, >>> I want to know the internal mechanism for the distributed queries of >>> SolrCloud. >>> AFAIK,distributed query is supported before the presence of SolrCloud, >>> users can >>> specify shard urls in the query parameters. We can distribute data by time >>> interval >>> in this case.It's called horizontal scalability based on history? >>> Now SolrCloud do further more because it can discover the other shards(Solr >>> instance) >>> via ZooKeeper and distribute data based on Hash & Mod of the unique key of >>> the doc. >>> For both cases the requested Solr instance need do scatter queries across >>> the shards >>> and gather the result at last.This process seems like Map-Reduce. >>> Buy what happens when scattering and gathering? I have read the WIKI but no >>> more >>> details available.I really hope someone can make me clear and give some >>> links. >>> >>> >>> Supposing there are 3 shards and 0 replica in my Solr cloud, each shard >>> have 150 >>> millions docs.My client query by q=*:* and outputs the results page by >>> page.When >>> the page number is very large,saying 400th page, does Solr need load all >>> the docs into >>> RAM to calculate score and order? >>> >>> >>> Sorry for newbie question and thanks for your time. >>> >>> >>> >>> >>> Thanks >>> SuoNayi >>> >>> >>> >>>