We do not want to generate the "id" ourselves and hence were looking for
something that would generate the "id" automatically.

UUIDUpdateProcessorFactory documentation says nothing about the
automatic "id" generation process identifying if the document received is
same as an existing document or not.

That means if I send {"color":"red", "size":"L"} once,
UUIDUpdateProcessorFactory
will
generate an "id" X and if I send the same document {"color":"red",
"size":"L"}  again,
UUIDUpdateProcessorFactory will not know that its the same document and
will generate an "id" Y.

That ways I will end up with two documents:
{"id": X, "color":"red", "size":"L"}
{"id": Y, "color":"red", "size":"L"}

And that situation can only be avoided if I use the
https://wiki.apache.org/solr/Deduplication technique of
generating an "id" based on the signature of some other fields. That will
avoid duplication and auto-generate
the "id" field too.

Is that a correct understanding?

Thanks
SG


On Mon, Jun 4, 2018 at 8:44 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> First, your assumption is correct. It would be A Bad Thing if two
> identical UUIDs were generated....
>
> Is this SolrCloud? If so, then the deduplication idea won't work. The
> problem is that the uuid is used for routing and there is a decent (1
> - 1/numShards) chance that the two "identical" docs would land on
> different shards, deduplication at the hash level is local to the
> replica.
>
> But why not make the hash of the doc's content the "id" field? Your
> ETL process would generate the hash and stuff it into the "id" field.
> Then in both SolrCloud or stand-alone it would "just work".
>
> Best,
> Erick
>
> On Mon, Jun 4, 2018 at 11:33 AM, Aman Tandon <amantandon...@gmail.com>
> wrote:
> > Hi,
> >
> > Suppose id field is the UUID linked field in the configuration and if
> this
> > is missing in the document coming to index then it will generate a UUID
> and
> > set it in id field. However if id field is present with some value then
> it
> > shouldn't.
> >
> > Kindly refer
> >
> http://lucene.apache.org/solr/5_5_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html
> >
> >
> > On Mon, Jun 4, 2018, 23:52 S G <sg.online.em...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Is it correct to assume that UUIDUpdateProcessorFactory will produce 2
> >> documents even if the same document is indexed twice without the "id"
> field
> >> ?
> >>
> >> And to avoid such a thing, we can use the technique mentioned in
> >> https://wiki.apache.org/solr/Deduplication ?
> >>
> >> Thanks
> >> SG
> >>
>

Reply via email to