Re: How distributed queries works?

Erick Erickson Tue, 22 Jan 2013 03:58:13 -0800

bq: does Solr need load all the docs into RAM to calculate score and order

You're very close. The query (and this is just like 3.x) is sent to
each shard. Let's say your page size is 20 (the &rows=20)

Each node will need to keep a list of 8020 documents (400 * 20) + 20,
really the ID and score, collect all these and send just the ID and
score back to the collecting shard. At that point, the collecting
shard combines the 24060 ID/score pairs into a master list and picks
the right 20 (8000 - 8020 in the combined list) docs and then asks
each shard for the portion of that 20 that were resident on them.

"Deep paging" over a sharded situation is pretty expensive, Solr is
optimized for returning the top N docs where N is usually pretty
small...

One minor nit. Solr doesn't load docs into RAM to calculate score,
just peruses the index="true" data to calculate score. All that stays
in RAM is the doc ID and score _until_ the document contents are
assembled, i.e. the raw data is only assembled for &rows docs and then
only at the very end...

Best
Erick

On Tue, Jan 22, 2013 at 2:47 AM, SuoNayi <suonayi2...@163.com> wrote:
> Dear list,
> I want to know the internal mechanism for the distributed queries of 
> SolrCloud.
> AFAIK,distributed query is supported before the presence of SolrCloud, users 
> can
> specify shard urls in the query parameters. We can distribute data by time 
> interval
> in this case.It's called horizontal scalability based on history?
> Now SolrCloud do further more because it can discover the other shards(Solr 
> instance)
> via ZooKeeper and distribute data based on Hash & Mod  of the unique key of 
> the doc.
> For both cases the requested Solr instance need do scatter queries across the 
> shards
> and gather the result at last.This process seems like Map-Reduce.
> Buy what happens when scattering and gathering? I have read the WIKI but no 
> more
> details available.I really hope someone can make me clear and give some links.
>
>
> Supposing there are 3 shards and 0 replica in my Solr cloud, each shard have 
> 150
> millions docs.My client query by q=*:* and outputs the results page by 
> page.When
> the page number is very large,saying 400th page, does Solr need load all the 
> docs into
> RAM to calculate score and order?
>
>
> Sorry for newbie question and thanks for your time.
>
>
>
>
> Thanks
> SuoNayi
>
>
>
>

Re: How distributed queries works?

Reply via email to