Re: Distributed Searching + unique Ids

Erick Erickson Tue, 14 Aug 2012 16:20:12 -0700

I still don't see the need to have duplicate documents here.
Simply have your indexing process put the data that should be
grouped on a shard on that shard. Let the rest of the objects
be randomly distributed amongst the shards...


Now, your front end has to know that some queries only need
to go to one shard and just send them there (non-distributed).
The rest of the queries go to the sharded handler.

Or you're over-thinking the problem and should just do normal
sharding and not worry about it. Let's say you have partitioned
your data amongst the shards as you indicate. Just send _every_
request to all the shards. The shards that don't have any data
that you're interested in would presumably have very-little-to-zero
processing involved assuming something like &fq=shardId gets
tacked on to the query. The complexity probably isn't worth the
minuscule savings, this smells like premature optimization....

Best
Erick

On Tue, Aug 14, 2012 at 9:20 AM, Eric Khoury <[email protected]> wrote:
>
> Hey Erick, thanks.I was hoping to shard on a very logical boundary for my 
> data, where most queries would only care about data on single shards, and 
> some queries would go to all shards, but that would only work if certain 
> common objects are duplicated across shards.Can you think of another way to 
> get this done, other than grouping common objects to yet another shard?Thanks 
> again,Eric.
>  > Date: Tue, 14 Aug 2012 08:15:44 -0600
>> Subject: Re: Distributed Searching + unique Ids
>> From: [email protected]
>> To: [email protected]
>>
>> Don't do this. Many bits of sharding assume that a uniqueKey
>> exists on one and only one shard. Document counts may be
>> off. Faceting may be off.  Etc.
>>
>> Why do you want to duplicate records across shards? What
>> benefit is this providing?
>>
>> This feels like an XY problem...
>>
>> Best
>> Erick
>>
>> On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[email protected]> wrote:
>> >
>> >
>> >
>> >
>> > hey guys, the spec mentions the following:
>> >
>> >
>> >  The unique
>> >      key field must be unique across all shards. If docs with
>> >      duplicate unique keys are encountered, Solr will make an attempt to 
>> > return
>> >      valid results, but the behavior may be non-deterministic.
>> >
>> >
>> > I'm actually looking to duplicate certain objects across shards, and 
>> > hoping to have duplicates removed when querying over all shards.If these 
>> > duplicates have the same ids, will that work?  Will this cause chaos with 
>> > paging?  I imagine that it might affect faceting as well?thanks,Eric.
>

Re: Distributed Searching + unique Ids

Reply via email to