First, your assumption is correct. It would be A Bad Thing if two identical UUIDs were generated....
Is this SolrCloud? If so, then the deduplication idea won't work. The problem is that the uuid is used for routing and there is a decent (1 - 1/numShards) chance that the two "identical" docs would land on different shards, deduplication at the hash level is local to the replica. But why not make the hash of the doc's content the "id" field? Your ETL process would generate the hash and stuff it into the "id" field. Then in both SolrCloud or stand-alone it would "just work". Best, Erick On Mon, Jun 4, 2018 at 11:33 AM, Aman Tandon <amantandon...@gmail.com> wrote: > Hi, > > Suppose id field is the UUID linked field in the configuration and if this > is missing in the document coming to index then it will generate a UUID and > set it in id field. However if id field is present with some value then it > shouldn't. > > Kindly refer > http://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html > > > On Mon, Jun 4, 2018, 23:52 S G <sg.online.em...@gmail.com> wrote: > >> Hi, >> >> Is it correct to assume that UUIDUpdateProcessorFactory will produce 2 >> documents even if the same document is indexed twice without the "id" field >> ? >> >> And to avoid such a thing, we can use the technique mentioned in >> https://wiki.apache.org/solr/Deduplication ? >> >> Thanks >> SG >>