Susheel, Just a guess, but carrot2.org might be useful. But it might be 
overkill. Cheers -- Rick

On August 30, 2017 7:40:08 AM MDT, Susheel Kumar <susheel2...@gmail.com> wrote:
>Hello,
>
>I am looking for different ideas/suggestions to solve the use case am
>working on.
>
>We have couple of fields in schema along with id, business_email and
>personal_email.  We need to return all records based on unique business
>and
>personal email's.
>
>The criteria for unique records is either of business or personal email
>has
>not repeated again in other records.
>The criteria for non-unique records is if any of the business or
>personal
>email has occurred/repeats in other records then all those records are
>non-unique.
>E.g considering below documents.
>- for unique records below only id=1 should be returned (since john.doe
>is
>not present in any other records personal or business email)
>- non unique records, below id=2,3 should be returned (since
>isabel.dora is
>present in multiple records. doesn't matter if it is present in
>business or
>personal email)
>
>Documents
>===
>{id:1,business_email_s:john....@abc.com,personal_email_s:john....@abc.com}
>{id:2,business_email_s:isabel.d...@abc.com}
>{id:3,personal_email_s:isabel.d...@abc.com}
>
>I am able to solve this using Streaming expression query but not sure
>if
>performance will become an bottleneck as the streaming expression is
>quite
>big. So looking for
>different ideas like using de-dupe or during ingestion/pre-process etc.
>without impacting performance much.
>
>Thanks,
>Susheel

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Reply via email to