Shawn, I knew that the shard had to be specified by the indexing process or document, but I didn't realize that the uniqueness of the document across the collection also had to be handled outside of solr as well.
We've used the compositeId router successfully to route documents, but it seemed that the implicit/manual routing might work for this new collection. Apparently not based on the requirement of the indexing processes to enforce uniqueness as well as distribution. Thanks for the help. Chris On Wed, Mar 14, 2018 at 11:39 AM Shawn Heisey <elyog...@elyograg.org> wrote: > On 3/14/2018 9:26 AM, Chris Ulicny wrote: > > We've been looking at using implicit for one of our collections, and > there > > seems to be some weird behavior that we're not sure whether it was > expected > > or not. > > > > Is it recommended to use a uniqueKey for implicit routing? Is the > following > > behavior intended? > > > > We have encountered the following issue. Create a collection with two > > shards (S1,S2), implicit routing, with "id" as uniqueKey, and > router.field > > as "routingfield". If we index > > > > {"id":"id1","routingfield":"S1"} > > > > It goes into shard S1. Then if we need to reindex the document with a > > different "routingfield" value: > > > > {"id":"id1","routingfield":"S2"} > > > > It goes into shard S2. However, when you select the document in a query, > it > > seems that both of those documents exist, but get deduped on return since > > selecting all documents only ever returns a single document. Adding > [shard] > > to the fl list results in the document coming from S1 some of the time > and > > S2 the rest. > > > > Trying to use /get with just the id results in a NullReferenceException. > > Adding the _route_ parameter in works, but both documents can be > retrieved. > > This is a common misconception with the implicit router. That name is a > completely correct summary of what the router does, but it is one of > those "overloaded" words in the English language that is often not > completely understood. > > A better name for "implicit" would actually be "manual." By using this > router, you have told Solr not to worry about routing -- that you're > going to handle it, and that you're going to make sure every document is > unique across all shards. Then you indexed the same document to two > shards -- intentionally. Solr isn't going to prevent that -- there's > nothing it can do to prevent it without making all indexing a LOT slower. > > If you want Solr to handle routing for you, then you must use the > compositeId router. With that router, you do not get to specify which > shard contains your document, and you cannot add shards after the > collection is created. Later you can SPLIT shards, but you can't add them. > > Thanks, > Shawn > >