bq: does Solr need load all the docs into RAM to calculate score and order You're very close. The query (and this is just like 3.x) is sent to each shard. Let's say your page size is 20 (the &rows=20)
Each node will need to keep a list of 8020 documents (400 * 20) + 20, really the ID and score, collect all these and send just the ID and score back to the collecting shard. At that point, the collecting shard combines the 24060 ID/score pairs into a master list and picks the right 20 (8000 - 8020 in the combined list) docs and then asks each shard for the portion of that 20 that were resident on them. "Deep paging" over a sharded situation is pretty expensive, Solr is optimized for returning the top N docs where N is usually pretty small... One minor nit. Solr doesn't load docs into RAM to calculate score, just peruses the index="true" data to calculate score. All that stays in RAM is the doc ID and score _until_ the document contents are assembled, i.e. the raw data is only assembled for &rows docs and then only at the very end... Best Erick On Tue, Jan 22, 2013 at 2:47 AM, SuoNayi <suonayi2...@163.com> wrote: > Dear list, > I want to know the internal mechanism for the distributed queries of > SolrCloud. > AFAIK,distributed query is supported before the presence of SolrCloud, users > can > specify shard urls in the query parameters. We can distribute data by time > interval > in this case.It's called horizontal scalability based on history? > Now SolrCloud do further more because it can discover the other shards(Solr > instance) > via ZooKeeper and distribute data based on Hash & Mod of the unique key of > the doc. > For both cases the requested Solr instance need do scatter queries across the > shards > and gather the result at last.This process seems like Map-Reduce. > Buy what happens when scattering and gathering? I have read the WIKI but no > more > details available.I really hope someone can make me clear and give some links. > > > Supposing there are 3 shards and 0 replica in my Solr cloud, each shard have > 150 > millions docs.My client query by q=*:* and outputs the results page by > page.When > the page number is very large,saying 400th page, does Solr need load all the > docs into > RAM to calculate score and order? > > > Sorry for newbie question and thanks for your time. > > > > > Thanks > SuoNayi > > > >