Hello,

I am looking for different ideas/suggestions to solve the use case am
working on.

We have couple of fields in schema along with id, business_email and
personal_email.  We need to return all records based on unique business and
personal email's.

The criteria for unique records is either of business or personal email has
not repeated again in other records.
The criteria for non-unique records is if any of the business or personal
email has occurred/repeats in other records then all those records are
non-unique.
E.g considering below documents.
- for unique records below only id=1 should be returned (since john.doe is
not present in any other records personal or business email)
- non unique records, below id=2,3 should be returned (since isabel.dora is
present in multiple records. doesn't matter if it is present in business or
personal email)

Documents
===
{id:1,business_email_s:john....@abc.com,personal_email_s:john....@abc.com}
{id:2,business_email_s:isabel.d...@abc.com}
{id:3,personal_email_s:isabel.d...@abc.com}

I am able to solve this using Streaming expression query but not sure if
performance will become an bottleneck as the streaming expression is quite
big. So looking for
different ideas like using de-dupe or during ingestion/pre-process etc.
without impacting performance much.

Thanks,
Susheel

Reply via email to