Re: solr equivalent of "select distinct"

Mark juszczec Sun, 11 Sep 2011 09:39:31 -0700

Erick

Thanks very much for the reply.


I typed this late Friday after work and tried to simplify the problem
description.  I got something wrong.  Hopefully this restatement is better:

My PK is FLD1, FLD2 and FLD3 concatenated together.

In some cases FLD1 and FLD2 can be the same.  The ONLY differing field being
FLD3.

Here's an example:

PK   FLD1      FLD2    FLD3 FLD4 FLD5
AB0  A            B          0     x       y
AB1  A            B          1     x       y
CD0  C            D          0     a       b
CD1  C            D          1     e       f

I want to write a query using only the terms FLD1 and FLD2 and ONLY get
back:

A B x y
C D a b
C D e f

Since FLD4 and FLD5 are the same for PK=AB0 and AB1, I only want one
occurrence of those records.

Since FLD4 and FLD5 are different for PK=CD0 and CD1, I want BOTH
occurrences of those records.

I'm hoping I can use wildcards to get FLD4 and FLD5.  If not, I can use fl=

I'm using edismax.

We are also creating the query string on the fly.  I suspect using SolrJ and
plugging the values into a bean would be easier - or do I have that wrong?

I hope the tables of example data display properly.

Mark

On Sun, Sep 11, 2011 at 12:06 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> This smells like an XY problem, can you back up and give a higher-level
> reason *why* you want this behavior?
>
> Because given your problem description, this seems like you are getting
> correct behavior no matter how you define the problem. You're essentially
> saying that you have two records with identical beginnings of your PK,
> why is it incorrect to give you both records?
>
> But, anyway, if you're searching on FLD1 and FLD2, then by definition
> you're going to get both records back or the search would be failing!
>
> Best
> Erick
>
> On Fri, Sep 9, 2011 at 8:08 PM, Mark juszczec <mark.juszc...@gmail.com>
> wrote:
> > Hello everyone
> >
> > Let's say each record in my index contains fields named PK, FLD1, FLD2,
> FLD3
> > .... FLD100
> >
> > PK is my solr primary key and I'm creating it by concatenating
> > FLD1+FLD2+FLD3 and I'm guaranteed that combination will be unique
> >
> > Let's say 2 of these records have FLD1 = A and FLD2 = B.  I am unsure
> about
> > the remaining fields
> >
> > Right now, if I do a query specifying FLD1 = A and FLD2 = B then I get
> both
> > records.  I only want 1.
> >
> > Research says I should use faceting.  But this:
> >
> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2 &
> > facet=true & facet_field=FLD1 & facet_field=FLD2
> >
> > gives me 2 records.
> >
> > In fact, it gives me the same results as:
> >
> > q=FLD1:A and FLD2:B & rows=500 & defType=edismax & fl=FLD1, FLD2
> >
> > I'm wrong somewhere, but I'm unsure where.
> >
> > Is faceting the right way to go or should I be using grouping?
> >
> > Curiously, when I use grouping like this:
> >
> > q=FLD1:A and FLD2:B &rows=500 &defType=edismax &indent=true &fl=FLD1,
> FLD2
> > &group=true &group.field=FLD1 &group.field=FLD2
> >
> > I get 2 records as well.
> >
> > Has anyone dealt with mimicing "select distinct" in Solr?
> >
> > Any advice would be very appreciated.
> >
> > Mark
> >
>

Re: solr equivalent of "select distinct"

Reply via email to