Re: questions about Solr shards

Babak Farhang Fri, 02 Jul 2010 01:07:07 -0700

Thanks Joe. This is all very interesting. So though it helps us scale,
sharding doesn't come cheap.


On Mon, Jun 28, 2010 at 9:50 AM, Joe Calderon <[email protected]> wrote:
> there is a first pass query to retrieve all matching document ids from
> every shard along with relevant sorting information, the document ids
> are then sorted and limited to the amount needed, then a second query
> is sent for the rest of the documents metadata.
>
> On Sun, Jun 27, 2010 at 7:32 PM, Babak Farhang <[email protected]> wrote:
>> Otis,
>>
>> Belated thanks for your reply.
>>
>>>> 2. "The index could change between stages, e.g. a
>>>> document that matched a
>>>> query and was subsequently changed may no
>>>> longer match but will still be
>>>> retrieved."
>>
>>> 2. This describes the situation where, for instance, a
>>> document with ID=10 is updated between the 2 calls
>>> to the Solr instance/shard where that doc ID=10 lives.
>>
>> Can you explain why this happens? (I.e. does each query to the sharded
>> index somehow involve 2 calls to each shard instance from the base
>> instance?)
>>
>> -Babak
>>
>> On Thu, Jun 24, 2010 at 10:14 PM, Otis Gospodnetic
>> <[email protected]> wrote:
>>> Hi Babak,
>>>
>>> 1. Yes, you are reading that correctly.
>>>
>>> 2. This describes the situation where, for instance, a document with ID=10 
>>> is updated between the 2 calls to the Solr instance/shard where that doc 
>>> ID=10 lives.
>>>
>>> 3. Yup, orthogonal.  You can have a master with multiple cores for sharded 
>>> and non-sharded indices and you can have a slave with cores that hold 
>>> complete indices or just their shards.
>>>  Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>>
>>>
>>>
>>> ----- Original Message ----
>>>> From: Babak Farhang <[email protected]>
>>>> To: [email protected]
>>>> Sent: Thu, June 24, 2010 6:32:54 PM
>>>> Subject: questions about Solr shards
>>>>
>>>> Hi everyone,
>>>
>>> There are a couple of notes on the limitations of this
>>>> approach at
>>>
>>>> target=_blank >http://wiki.apache.org/solr/DistributedSearch which I'm
>>>> having trouble
>>> understanding.
>>>
>>> 1. "When duplicate doc IDs are received,
>>>> Solr chooses the first doc
>>>   and discards subsequent
>>>> ones"
>>>
>>> "Received" here is from the perspective of the base Solr instance
>>>> at
>>> query time, right?  I.e. if you inadvertently indexed 2 versions
>>>> of
>>> the document with the same unique ID but different contents to
>>>> 2
>>> shards, then at query time, the "first" document (putting aside for
>>> the
>>>> moment what exactly "first" means) would win.  Am I reading
>>>> this
>>> right?
>>>
>>>
>>> 2. "The index could change between stages, e.g. a
>>>> document that matched a
>>>   query and was subsequently changed may no
>>>> longer match but will still be
>>>   retrieved."
>>>
>>> I have no idea what
>>>> this second statement means.
>>>
>>>
>>> And one other question about
>>>> shards:
>>>
>>> 3. The examples I've seen documented do not illustrate
>>>> sharded,
>>> multicore setups; only sharded monolithic cores.  I assume
>>>> sharding
>>> works with multicore as well (i.e. the two issues are
>>>> orthogonal).  Is
>>> this right?
>>>
>>>
>>> Any help on interpreting the
>>>> above would be much appreciated.
>>>
>>> Thank you,
>>> -Babak
>>>
>>
>

Re: questions about Solr shards

Reply via email to