On 6/9/2018 1:15 AM, S G wrote:
That means if I send {"color":"red", "size":"L"} once,
UUIDUpdateProcessorFactory
will
generate an "id" X and if I send the same document {"color":"red",
"size":"L"}  again,
UUIDUpdateProcessorFactory will not know that its the same document and
will generate an "id" Y.

That ways I will end up with two documents:
{"id": X, "color":"red", "size":"L"}
{"id": Y, "color":"red", "size":"L"}

Correct, that's exactly what will happen.  That update processor's name makes it sound like it can be used to completely cover situations where the source data doesn't already have a unique key.  But all it does is just randomly generate a unique ID, it won't EVER assign the same ID, even if the document is absolutely identical to one that was indexed before.

And that situation can only be avoided if I use the
https://wiki.apache.org/solr/Deduplication technique of
generating an "id" based on the signature of some other fields. That will
avoid duplication and auto-generate
the "id" field too.

Is that a correct understanding?

The deduplication support generates a signature from the contents of the named fields.  I haven't used this functionality, but I believe that if you write the signature to the field designated uniqueKey in the Solr schema, it would do everything you're hoping for.  The first complete example on that page you referenced sets signatureField to "id", which is typically the uniqueKey in Solr's example schemas.

Thanks,
Shawn

Reply via email to