Thanks Joe. This is all very interesting. So though it helps us scale, sharding doesn't come cheap.
On Mon, Jun 28, 2010 at 9:50 AM, Joe Calderon <[email protected]> wrote: > there is a first pass query to retrieve all matching document ids from > every shard along with relevant sorting information, the document ids > are then sorted and limited to the amount needed, then a second query > is sent for the rest of the documents metadata. > > On Sun, Jun 27, 2010 at 7:32 PM, Babak Farhang <[email protected]> wrote: >> Otis, >> >> Belated thanks for your reply. >> >>>> 2. "The index could change between stages, e.g. a >>>> document that matched a >>>> query and was subsequently changed may no >>>> longer match but will still be >>>> retrieved." >> >>> 2. This describes the situation where, for instance, a >>> document with ID=10 is updated between the 2 calls >>> to the Solr instance/shard where that doc ID=10 lives. >> >> Can you explain why this happens? (I.e. does each query to the sharded >> index somehow involve 2 calls to each shard instance from the base >> instance?) >> >> -Babak >> >> On Thu, Jun 24, 2010 at 10:14 PM, Otis Gospodnetic >> <[email protected]> wrote: >>> Hi Babak, >>> >>> 1. Yes, you are reading that correctly. >>> >>> 2. This describes the situation where, for instance, a document with ID=10 >>> is updated between the 2 calls to the Solr instance/shard where that doc >>> ID=10 lives. >>> >>> 3. Yup, orthogonal. You can have a master with multiple cores for sharded >>> and non-sharded indices and you can have a slave with cores that hold >>> complete indices or just their shards. >>> Otis >>> ---- >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>> Lucene ecosystem search :: http://search-lucene.com/ >>> >>> >>> >>> ----- Original Message ---- >>>> From: Babak Farhang <[email protected]> >>>> To: [email protected] >>>> Sent: Thu, June 24, 2010 6:32:54 PM >>>> Subject: questions about Solr shards >>>> >>>> Hi everyone, >>> >>> There are a couple of notes on the limitations of this >>>> approach at >>> >>>> target=_blank >http://wiki.apache.org/solr/DistributedSearch which I'm >>>> having trouble >>> understanding. >>> >>> 1. "When duplicate doc IDs are received, >>>> Solr chooses the first doc >>> and discards subsequent >>>> ones" >>> >>> "Received" here is from the perspective of the base Solr instance >>>> at >>> query time, right? I.e. if you inadvertently indexed 2 versions >>>> of >>> the document with the same unique ID but different contents to >>>> 2 >>> shards, then at query time, the "first" document (putting aside for >>> the >>>> moment what exactly "first" means) would win. Am I reading >>>> this >>> right? >>> >>> >>> 2. "The index could change between stages, e.g. a >>>> document that matched a >>> query and was subsequently changed may no >>>> longer match but will still be >>> retrieved." >>> >>> I have no idea what >>>> this second statement means. >>> >>> >>> And one other question about >>>> shards: >>> >>> 3. The examples I've seen documented do not illustrate >>>> sharded, >>> multicore setups; only sharded monolithic cores. I assume >>>> sharding >>> works with multicore as well (i.e. the two issues are >>>> orthogonal). Is >>> this right? >>> >>> >>> Any help on interpreting the >>>> above would be much appreciated. >>> >>> Thank you, >>> -Babak >>> >> >
