Hi all, I've recently been working with the distributed search capabilities of solr to build a web portal ; all is working fine, but it is now time for me to describe my work on a "theoretical" point of view.
I've been trying to approximately figure the distributed search mechanism out first by browsing the code, but it's too complex for me ; then by reading the JIRA comments accompanying the commits where I found this : *************** The search request processing on the set of shards is performed as follows: STEP 1: The query is built, terms are extracted. Global numDocs and docFreqs are calculated by requesting all the shards and adding up numDocs and docFreqs from each shard. STEP 2: (FirstQueryPhase) All shards are queried. Global numDocs and docFreqs are passed as request parameters. All document fields are NOT requested, only document uniqFields and sort fields are requested. MoreLikeThis and Highlighting information are NOT requested. Etc... *************** This is typically the kind of description I need, but I wonder if the one cited above is still valid (since it was apparently written quite a time before final commit). Assuming it is, what's then the difference between the STEPS mentioned and the STAGES later introduced (STAGE_START, STAGE_PARSE_QUERY, etc...) ? How the ranking of the documents in the merged set of responses is calculated (especially when sorting on a field) ? Finally, does the order of the parameters in the query is significant in a distributed search case ? (i.e, is there a difference between : - http://server1:port1 /solr1/?q=title:blah&shards=server1:port1/solr1,server1:port1/solr2 and - http://server1:port1 /solr1/?shards=server1:port1/solr1,server1:port1/solr2&q=title:blah ? (this last question is more related with the distributed deadlock topic on the wiki. : my understanding is that in my first example the "title:blah" query is send as a top level query to solr1 and as a "shard query" to both solr1 and solr2 (deadlock risk) ; while in the second example, "title:blah" is not sent to solr1 as a top level query. Am I right ?)) That's a lot if question and a too long post maybe : sorry. Thanks a lot if you feel the courage to answer, -- Grégoire Neuville