Thank you very Much Erick, That was the real problem, we had two cores sharing the same folder and core_name. Here is the definitive version of the solr.xml. Tested and correctly working
<core schema="schema.xml" shard="shard2" instanceDir="1_collection/shard2/" name="1_collection_shard2" config="solrconfig.xml" collection="1_collection"/> <core schema="schema.xml" shard="shard4" instanceDir="1_collection/shard4/" name="1_collection_shard4" config="solrconfig.xml" collection="1_collection"/> Thanks everybody Iker 2013/5/6 Erick Erickson <erickerick...@gmail.com> > Having multiple cores point to the same index is, except for > special circumstances where one of the cores is guaranteed to > be read only, a Bad Thing. > > So it sounds like you've found your issue... > > Best > Erick > > On Mon, May 6, 2013 at 4:44 AM, Iker Mtnz. Apellaniz > <mitxin...@gmail.com> wrote: > > Thanks Erick, > > I think we found the problem. When defining the cores for both shards > we > > define both of them in the same instanceDir, like this: > > <core schema="schema.xml" shard="shard2" instanceDir="1_collection/" > > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > > <core schema="schema.xml" shard="shard4" instanceDir="1_collection/" > > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > > > > Each shard should have its own folder, so the final configuration > should > > be like this: > > <core schema="schema.xml" shard="shard2" > instanceDir="1_collection/shard2/" > > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > > <core schema="schema.xml" shard="shard4" > instanceDir="1_collection/shard4/" > > name="1_collection" config="solrconfig.xml" collection="1_collection"/> > > > > Can anyone confirm this? > > > > Thanks, > > Iker > > > > > > 2013/5/4 Erick Erickson <erickerick...@gmail.com> > > > >> Sounds like you've explicitly routed the same document to two > >> different shards. Document replacement only happens locally to a > >> shard, so the fact that you have documents with the same ID on two > >> different shards is why you're getting duplicate documents. > >> > >> Best > >> Erick > >> > >> On Fri, May 3, 2013 at 3:44 PM, Iker Mtnz. Apellaniz > >> <mitxin...@gmail.com> wrote: > >> > We are currently using version 4.2. > >> > We have made tests with a single document and it gives us a 2 document > >> > count. But if we force to shard into te first machine, the one with a > >> > unique shard, the count gives us 1 document. > >> > I've tried using distrib=false parameter, it gives us no duplicate > >> > documents, but the same document appears to be in two different > shards. > >> > > >> > Finally, about the separate directories, We have only one directory > for > >> the > >> > data in each physical machine and collection, and I don't see any > >> subfolder > >> > for the different shards. > >> > > >> > Is it possible that we have something wrong with the dataDir > >> configuration > >> > to use multiple shards in one machine? > >> > > >> > <dataDir>${solr.data.dir:}</dataDir> > >> > <directoryFactory name="DirectoryFactory" > >> > class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/> > >> > > >> > > >> > > >> > 2013/5/3 Erick Erickson <erickerick...@gmail.com> > >> > > >> >> What version of Solr? The custom routing stuff is quite new so > >> >> I'm guessing 4x? > >> >> > >> >> But this shouldn't be happening. The actual index data for the > >> >> shards should be in separate directories, they just happen to > >> >> be on the same physical machine. > >> >> > >> >> Try querying each one with &distrib=false to see the counts > >> >> from single shards, that may shed some light on this. It vaguely > >> >> sounds like you have indexed the same document to both shards > >> >> somehow... > >> >> > >> >> Best > >> >> Erick > >> >> > >> >> On Fri, May 3, 2013 at 5:28 AM, Iker Mtnz. Apellaniz > >> >> <mitxin...@gmail.com> wrote: > >> >> > Hi, > >> >> > We have currently a solrCloud implementation running 5 shards in > 3 > >> >> > physical machines, so the first machine will have the shard number > 1, > >> the > >> >> > second machine shards 2 & 4, and the third shards 3 & 5. We noticed > >> that > >> >> > while queryng numFoundDocs decreased when we increased the start > >> param. > >> >> > After some investigation we found that the documents in shards 2 > to > >> 5 > >> >> > were being counted twice. Querying to shard 2 will give you back > the > >> >> > results for shard 2 & 4, and the same thing for shards 3 & 5. Our > >> guess > >> >> is > >> >> > that the physical index for both shard 2&4 is shared, so the shards > >> don't > >> >> > know which part of it is for each one. > >> >> > The uniqueKey is correctly defined, and we have tried using shard > >> >> prefix > >> >> > (shard1!docID). > >> >> > > >> >> > Is there any way to solve this problem when a unique physical > >> machine > >> >> > shares shards? > >> >> > Is it a "real" problem os it just affects facet & numResults? > >> >> > > >> >> > Thanks > >> >> > Iker > >> >> > > >> >> > -- > >> >> > /** @author imartinez*/ > >> >> > Person me = *new* Developer(); > >> >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > >> >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > >> >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > >> >> World"]}); > >> >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > >> >> > me.setWebs({*urbasaabentura.com, ikertxef.com*}); > >> >> > *return* me; > >> >> > >> > > >> > > >> > > >> > -- > >> > /** @author imartinez*/ > >> > Person me = *new* Developer(); > >> > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > >> > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > >> > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > >> World"]}); > >> > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > >> > *return* me; > >> > > > > > > > > -- > > /** @author imartinez*/ > > Person me = *new* Developer(); > > me.setName(*"Iker Mtz de Apellaniz Anzuola"*); > > me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); > > me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, > World"]}); > > me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); > > *return* me; > -- /** @author imartinez*/ Person me = *new* Developer(); me.setName(*"Iker Mtz de Apellaniz Anzuola"*); me.setTwit("@mitxino77 <https://twitter.com/mitxino77>"); me.setLocations({"St Cugat, Barcelona", "Kanpezu, Euskadi", "*, World"]}); me.setSkills({*SoftwareDeveloper, Curious, AmateurCook*}); *return* me;