How distributed queries works?

SuoNayi Mon, 21 Jan 2013 23:48:01 -0800

Dear list,
I want to know the internal mechanism for the distributed queries of SolrCloud.
AFAIK,distributed query is supported before the presence of SolrCloud, users 
can 
specify shard urls in the query parameters. We can distribute data by time 
interval 
in this case.It's called horizontal scalability based on history?
Now SolrCloud do further more because it can discover the other shards(Solr 
instance) 
via ZooKeeper and distribute data based on Hash & Mod  of the unique key of the 
doc.
For both cases the requested Solr instance need do scatter queries across the 
shards 
and gather the result at last.This process seems like Map-Reduce.
Buy what happens when scattering and gathering? I have read the WIKI but no 
more 
details available.I really hope someone can make me clear and give some links.



Supposing there are 3 shards and 0 replica in my Solr cloud, each shard have 
150 
millions docs.My client query by q=*:* and outputs the results page by 
page.When 
the page number is very large,saying 400th page, does Solr need load all the 
docs into 
RAM to calculate score and order?


Sorry for newbie question and thanks for your time.




Thanks
SuoNayi

How distributed queries works?

Reply via email to