Dear list, I want to know the internal mechanism for the distributed queries of SolrCloud. AFAIK,distributed query is supported before the presence of SolrCloud, users can specify shard urls in the query parameters. We can distribute data by time interval in this case.It's called horizontal scalability based on history? Now SolrCloud do further more because it can discover the other shards(Solr instance) via ZooKeeper and distribute data based on Hash & Mod of the unique key of the doc. For both cases the requested Solr instance need do scatter queries across the shards and gather the result at last.This process seems like Map-Reduce. Buy what happens when scattering and gathering? I have read the WIKI but no more details available.I really hope someone can make me clear and give some links.
Supposing there are 3 shards and 0 replica in my Solr cloud, each shard have 150 millions docs.My client query by q=*:* and outputs the results page by page.When the page number is very large,saying 400th page, does Solr need load all the docs into RAM to calculate score and order? Sorry for newbie question and thanks for your time. Thanks SuoNayi