Re: Solr 7.0.1 Duplicate document appearing in search results

Erick Erickson Wed, 15 May 2019 10:53:57 -0700


> On May 14, 2019, at 7:46 PM, Adam Walz <a...@adamwalz.net> wrote:
> 
> but do
> use an external map reduce process to reindex


Here’s where I’d look then. Not knowing any details of your process this may be 
totally wrong of course….

If there’s any step that performs a MERGEINDEX operation, _and_ somehow the 
same <uniqueKey> got indexed to the sub-indexes that are being merged, then 
there’s no deduplication on and you will have multiple docs with the same 
<uniqueKey>. I strongly suspect that that, or something similar, is happening. 
That’s how MapReduceIndexerTool operated, there were N sub-indexed produced 
totally independently and then a MERGEINDEX operation happened on a per-shard 
basis.

Or something unexpected like there being no <uniqueKey> defined in the schema 
somehow.

I have never of Solr failing to remove old documents when a new one with the 
same ID is being indexed without something like the above being the problem.

One bit of background: Lucene has no notion of <uniqueKey>, that is totally a 
Solr construct and is up to Solr to enforce. So anything that bypasses Solr 
could produces this…

FWIW,
Erick

Re: Solr 7.0.1 Duplicate document appearing in search results

Reply via email to