Re: Duplicated Documents Across shards

2013-05-06 Thread Shawn Heisey
> Oops... you're right, and before I started writing that response I had the > thought that these should be "shardDir", but even that is confused. I > think > "replicaDir" or "collectionReplica" or "shardReplicaDir" or... > "collectionShardReplicaDir" - the latter is wordy, but is explicit. I'd > r

Re: Duplicated Documents Across shards

2013-05-06 Thread Jack Krupansky
rm. Even a "single core" Solr is using a "collection" (that happens to be single-core and single-shard and single-replica.) To wit, the stock Solr example, which is not SolrCloud, is named "collection1". -- Jack Krupansky -Original Message- From: Shawn

Re: Duplicated Documents Across shards

2013-05-06 Thread Shawn Heisey
On 5/6/2013 7:44 AM, Jack Krupansky wrote: > I think if we had a more compehensible term for a "collection > configuration directory", a lot of the confusion would go away. I mean, > what the heck is an "instance" anyway? How does "instanceDir" relate to > an "instance" of the Solr "server"? Sure,

Re: Duplicated Documents Across shards

2013-05-06 Thread Jack Krupansky
think it's the same for all cores in a Solr "instance". We should reconsider the name of that term. My choice: collectionDir. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Monday, May 06, 2013 7:39 AM To: solr-user@lucene.apache.org Subject: Re: D

Re: Duplicated Documents Across shards

2013-05-06 Thread Iker Mtnz. Apellaniz
Thank you very Much Erick, That was the real problem, we had two cores sharing the same folder and core_name. Here is the definitive version of the solr.xml. Tested and correctly working Thanks everybody Iker 2013/5/6 Erick Erickson > Having multiple cores point to the same index is, exc

Re: Duplicated Documents Across shards

2013-05-06 Thread Erick Erickson
Having multiple cores point to the same index is, except for special circumstances where one of the cores is guaranteed to be read only, a Bad Thing. So it sounds like you've found your issue... Best Erick On Mon, May 6, 2013 at 4:44 AM, Iker Mtnz. Apellaniz wrote: > Thanks Erick, > I think w

Re: Duplicated Documents Across shards

2013-05-06 Thread Iker Mtnz. Apellaniz
Thanks Erick, I think we found the problem. When defining the cores for both shards we define both of them in the same instanceDir, like this: Each shard should have its own folder, so the final configuration should be like this: Can anyone confirm this? Thanks, Iker 2013/5/4 Erick E

Re: Duplicated Documents Across shards

2013-05-04 Thread Erick Erickson
Sounds like you've explicitly routed the same document to two different shards. Document replacement only happens locally to a shard, so the fact that you have documents with the same ID on two different shards is why you're getting duplicate documents. Best Erick On Fri, May 3, 2013 at 3:44 PM,

Re: Duplicated Documents Across shards

2013-05-03 Thread Iker Mtnz. Apellaniz
We are currently using version 4.2. We have made tests with a single document and it gives us a 2 document count. But if we force to shard into te first machine, the one with a unique shard, the count gives us 1 document. I've tried using distrib=false parameter, it gives us no duplicate documents,

Re: Duplicated Documents Across shards

2013-05-03 Thread Erick Erickson
What version of Solr? The custom routing stuff is quite new so I'm guessing 4x? But this shouldn't be happening. The actual index data for the shards should be in separate directories, they just happen to be on the same physical machine. Try querying each one with &distrib=false to see the counts

Duplicated Documents Across shards

2013-05-03 Thread Iker Mtnz. Apellaniz
Hi, We have currently a solrCloud implementation running 5 shards in 3 physical machines, so the first machine will have the shard number 1, the second machine shards 2 & 4, and the third shards 3 & 5. We noticed that while queryng numFoundDocs decreased when we increased the start param. After