We do not want to generate the "id" ourselves and hence were looking for something that would generate the "id" automatically.
UUIDUpdateProcessorFactory documentation says nothing about the automatic "id" generation process identifying if the document received is same as an existing document or not. That means if I send {"color":"red", "size":"L"} once, UUIDUpdateProcessorFactory will generate an "id" X and if I send the same document {"color":"red", "size":"L"} again, UUIDUpdateProcessorFactory will not know that its the same document and will generate an "id" Y. That ways I will end up with two documents: {"id": X, "color":"red", "size":"L"} {"id": Y, "color":"red", "size":"L"} And that situation can only be avoided if I use the https://wiki.apache.org/solr/Deduplication technique of generating an "id" based on the signature of some other fields. That will avoid duplication and auto-generate the "id" field too. Is that a correct understanding? Thanks SG On Mon, Jun 4, 2018 at 8:44 PM Erick Erickson <erickerick...@gmail.com> wrote: > First, your assumption is correct. It would be A Bad Thing if two > identical UUIDs were generated.... > > Is this SolrCloud? If so, then the deduplication idea won't work. The > problem is that the uuid is used for routing and there is a decent (1 > - 1/numShards) chance that the two "identical" docs would land on > different shards, deduplication at the hash level is local to the > replica. > > But why not make the hash of the doc's content the "id" field? Your > ETL process would generate the hash and stuff it into the "id" field. > Then in both SolrCloud or stand-alone it would "just work". > > Best, > Erick > > On Mon, Jun 4, 2018 at 11:33 AM, Aman Tandon <amantandon...@gmail.com> > wrote: > > Hi, > > > > Suppose id field is the UUID linked field in the configuration and if > this > > is missing in the document coming to index then it will generate a UUID > and > > set it in id field. However if id field is present with some value then > it > > shouldn't. > > > > Kindly refer > > > http://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html > > > > > > On Mon, Jun 4, 2018, 23:52 S G <sg.online.em...@gmail.com> wrote: > > > >> Hi, > >> > >> Is it correct to assume that UUIDUpdateProcessorFactory will produce 2 > >> documents even if the same document is indexed twice without the "id" > field > >> ? > >> > >> And to avoid such a thing, we can use the technique mentioned in > >> https://wiki.apache.org/solr/Deduplication ? > >> > >> Thanks > >> SG > >> >