On 8/2/2016 7:50 AM, Bernd Fehling wrote: > Only assumption so far, DIH is sending the records as "update" (and > not pure "add") to the indexer which will generate delete files during > merge. If the number of segments is high it will take quite long to > merge and check all records of all segments.
It's not DIH that's handling the requests as "update", it's Solr. If you index a document with the same value in the uniqueKey field as a document that already exists in the index, Solr will delete the old one before it adds the new one. This applies to ANY indexing, not just DIH. This is how Solr is designed to work -- that's the entire point of having a uniqueKey. I'm not familiar with how a large number of deletes affects merging. I would not expect it to have much of a performance impact, and it might in fact make merging faster, because I'd think that deleted docs would be skipped. Turning overwrite off when you are indexing would mean that Solr's uniqueKey guarantee is lost. You can end up with duplicate documents in the Lucene index, and because merging can completely change internal identifiers, there may be no built-in way for Solr or Lucene to automatically determine which ones are old or new. I didn't know about LUCENE-6161. That looks like a nasty bug. Thanks, Shawn