I still don't see the need to have duplicate documents here. Simply have your indexing process put the data that should be grouped on a shard on that shard. Let the rest of the objects be randomly distributed amongst the shards...
Now, your front end has to know that some queries only need to go to one shard and just send them there (non-distributed). The rest of the queries go to the sharded handler. Or you're over-thinking the problem and should just do normal sharding and not worry about it. Let's say you have partitioned your data amongst the shards as you indicate. Just send _every_ request to all the shards. The shards that don't have any data that you're interested in would presumably have very-little-to-zero processing involved assuming something like &fq=shardId gets tacked on to the query. The complexity probably isn't worth the minuscule savings, this smells like premature optimization.... Best Erick On Tue, Aug 14, 2012 at 9:20 AM, Eric Khoury <ekhour...@hotmail.com> wrote: > > Hey Erick, thanks.I was hoping to shard on a very logical boundary for my > data, where most queries would only care about data on single shards, and > some queries would go to all shards, but that would only work if certain > common objects are duplicated across shards.Can you think of another way to > get this done, other than grouping common objects to yet another shard?Thanks > again,Eric. > > Date: Tue, 14 Aug 2012 08:15:44 -0600 >> Subject: Re: Distributed Searching + unique Ids >> From: erickerick...@gmail.com >> To: solr-user@lucene.apache.org >> >> Don't do this. Many bits of sharding assume that a uniqueKey >> exists on one and only one shard. Document counts may be >> off. Faceting may be off. Etc. >> >> Why do you want to duplicate records across shards? What >> benefit is this providing? >> >> This feels like an XY problem... >> >> Best >> Erick >> >> On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <ekhour...@hotmail.com> wrote: >> > >> > >> > >> > >> > hey guys, the spec mentions the following: >> > >> > >> > The unique >> > key field must be unique across all shards. If docs with >> > duplicate unique keys are encountered, Solr will make an attempt to >> > return >> > valid results, but the behavior may be non-deterministic. >> > >> > >> > I'm actually looking to duplicate certain objects across shards, and >> > hoping to have duplicates removed when querying over all shards.If these >> > duplicates have the same ids, will that work? Will this cause chaos with >> > paging? I imagine that it might affect faceting as well?thanks,Eric. >