Thanks Erick, I think we found the problem. When defining the cores for both shards we define both of them in the same instanceDir, like this: <core schema="schema.xml" shard="shard2" instanceDir="1_collection/" name="1_collection" config="solrconfig.xml" collection="1_collection"/> <core schema="schema.xml" shard="shard4" instanceDir="1_collection/" name="1_collection" config="solrconfig.xml" collection="1_collection"/>
Each shard should have its own folder, so the final configuration should be like this: <core schema="schema.xml" shard="shard2" instanceDir="1_collection/shard2/" name="1_collection" config="solrconfig.xml" collection="1_collection"/> <core schema="schema.xml" shard="shard4" instanceDir="1_collection/shard4/" name="1_collection" config="solrconfig.xml" collection="1_collection"/> Can anyone confirm this? Thanks, Iker 2013/5/4 Erick Erickson <erickerick...@gmail.com> > Sounds like you've explicitly routed the same document to two > different shards. Document replacement only happens locally to a > shard, so the fact that you have documents with the same ID on two > different shards is why you're getting duplicate documents. > > Best > Erick > > On Fri, May 3, 2013 at 3:44 PM, Iker Mtnz. Apellaniz > <mitxin...@gmail.com> wrote: > > We are currently using version 4.2. > > We have made tests with a single document and it gives us a 2 document > > count. But if we force to shard into te first machine, the one with a > > unique shard, the count gives us 1 document. > > I've tried using distrib=false parameter, it gives us no duplicate > > documents, but the same document appears to be in two different shards. > > > > Finally, about the separate directories, We have only one directory for > the > > data in each physical machine and collection, and I don't see any > subfolder > > for the different shards. > > > > Is it possible that we have something wrong with the dataDir > configuration > > to use multiple shards in one machine? > > > > <dataDir>${solr.data.dir:}</dataDir> > > <directoryFactory name="DirectoryFactory" > > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> > > > > > > > > 2013/5/3 Erick Erickson <erickerick...@gmail.com> > > > >> What version of Solr? The custom routing stuff is quite new so > >> I'm guessing 4x? > >> > >> But this shouldn't be happening. The actual index data for the > >> shards should be in separate directories, they just happen to > >> be on the same physical machine. > >> > >> Try querying each one with &distrib=false to see the counts > >> from single shards, that may shed some light on this. It vaguely > >> sounds like you have indexed the same document to both shards > >> somehow... > >> > >> Best > >> Erick > >> > >> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz > >> <mitxin...@gmail.com> wrote: > >> > Hi, > >> > We have currently a solrCloud implementation running 5 shards in 3 > >> > physical machines, so the first machine will have the shard number 1, > the > >> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed > that > >> > while queryng numFoundDocs decreased when we increased the start > param. > >> > After some investigation we found that the documents in shards 2 to > 5 > >> > were being counted twice. Querying to shard 2 will give you back the > >> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our > guess > >> is > >> > that the physical index for both shard 2&4 is shared, so the shards > don't > >> > know which part of it is for each one. > >> > The uniqueKey is correctly defined, and we have tried using shard > >> prefix > >> > (shard1!docID). > >> > > >> > Is there any way to solve this problem when a unique physical > machine > >> > shares shards? > >> > Is it a "real" problem os it just affects facet & numResults? > >> > > >> > Thanks > >> > Iker > >> > > >> > -- > >> > /** @author imartinez*/ > >> > Person me = *new* Developer(); > >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > >> World"]}); > >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > >> > me.setWebs({*urbasaabentura.com, ikertxef.com*}); > >> > *return* me; > >> > > > > > > > > -- > > /** @author imartinez*/ > > Person me = *new* Developer(); > > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > World"]}); > > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > > *return* me; > -- /** @author imartinez*/ Person me = *new* Developer(); me.setName(*"Iker Mtz de Apellaniz Anzuola"*); me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]}); me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); *return* me;