Hi all,

I've recently been working with the distributed search capabilities of solr
to build a web portal ; all is working fine, but it is now time for me to
describe my work on a "theoretical" point of view.

I've been trying to approximately figure the distributed search mechanism
out first by browsing the code, but it's too complex for me ; then by
reading the JIRA comments accompanying the commits where I found this :

***************
The search request processing on the set of shards is performed as follows:

STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs
are calculated by requesting all the shards and adding up numDocs and
docFreqs from each shard.

STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and
docFreqs are passed as request parameters. All document fields are NOT
requested, only document uniqFields and sort fields are requested.
MoreLikeThis and Highlighting information are NOT requested.

Etc...
***************

This is typically the kind of description I need, but I wonder if the one
cited above is still valid (since it was apparently written quite a time
before final commit).
Assuming it is, what's then the difference between the STEPS mentioned and
the STAGES later introduced (STAGE_START, STAGE_PARSE_QUERY, etc...) ?

How the ranking of the documents in the merged set of responses is
calculated (especially when sorting on a field) ?

Finally, does the order of the parameters in the query is significant in a
distributed search case ? (i.e, is there a difference between :
   - http://server1:port1
/solr1/?q=title:blah&shards=server1:port1/solr1,server1:port1/solr2
and
   - http://server1:port1
/solr1/?shards=server1:port1/solr1,server1:port1/solr2&q=title:blah
?
(this last question is more related with the distributed deadlock topic on
the wiki. : my understanding is that in my first example the "title:blah"
query is send as a top level query to solr1 and as a "shard query" to both
solr1 and solr2 (deadlock risk) ; while in the second example, "title:blah"
is not sent to solr1 as a top level query. Am I right ?))

That's a lot if question and a too long post maybe : sorry.

Thanks a lot if you feel the courage to answer,

-- 
Grégoire Neuville

Reply via email to