Re: Duplication of Output

Erick Erickson Wed, 31 Aug 2011 15:07:26 -0700

The first question I'd ask is "why are there duplicates
in your index in the first place?". If you're denormalizing,
that would account for it. Mostly, I'm just asking to be
sure that you expect duplicate product IDs. If you make
your productid a <uniqueKey>, there'll only be one of each....


You'll have to re-index if you make this change though.

But grouping/field collapsing would, indeed, apply to this
problem.

deduplication isn't applicable, since you know exactly what
duplicates are. deduplication is more for "fuzzy" removal
of near-duplicates..

Hope this helps
Erick

On Wed, Aug 31, 2011 at 12:01 AM, Aaron Bains <aaronba...@gmail.com> wrote:
> Hello,
>
> What is the best way to remove duplicate values on output. I am using the
> following query:
>
> /solr/select/?q=wrt54g2&version=2.2&start=0&rows=10&indent=on&*fl=productid*
>
> And I get the following results:
>
> <doc>
> <int name="productid">1011630553</int>
> </doc>
> <doc>
> <int name="productid">1011630553</int>
> </doc>
> <doc><int name="productid">1011630553</int>
> </doc>
> <doc><int name="productid">1011630553</int>
> </doc>
> <doc><int name="productid">1011630553</int>
> </doc>
> <doc><int name="productid">1011630553</int>
> </doc>
> <doc><int name="productid">1011630553</int>
> </doc>
> <doc><int name="productid">1013033708</int>
> </doc>
> <doc><int name="productid">1013033708</int>
> </doc>
> <doc><int name="productid">1013033708</int>
> </doc>
>
>
> But I don't want those results because there are duplicates. I am looking
> for results like below:
>
> <doc>
> <int name="productid">1011630553</int>
> </doc>
> <doc>
> <int name="productid">1013033708</int>
> </doc>
>
> I know there is deduplication and field collapsing but I am not sure if they
> are applicable in this situation. Thanks for your help!
>

Re: Duplication of Output

Reply via email to