> The first question I'd ask is "why are there duplicates > in your index in the first place?". If you're denormalizing, > that would account for it. Mostly, I'm just asking to be > sure that you expect duplicate product IDs. If you make > your productid a <uniqueKey>, there'll only be one of each.... > > You'll have to re-index if you make this change though. > > But grouping/field collapsing would, indeed, apply to this > problem. > > deduplication isn't applicable, since you know exactly what > duplicates are. deduplication is more for "fuzzy" removal > of near-duplicates..
That's only if you use Nutch' TextProfileSignature, MD5 and Lookup3 are meant for exact matching. I don't know if Lookup3Signature works on non-string/text values but i see no reason it should not work. Might be an improvement to allow deduplication that skips creating a signature field and dedup on non-string values instead of that signature field. > > Hope this helps > Erick > > On Wed, Aug 31, 2011 at 12:01 AM, Aaron Bains <aaronba...@gmail.com> wrote: > > Hello, > > > > What is the best way to remove duplicate values on output. I am using the > > following query: > > > > /solr/select/?q=wrt54g2&version=2.2&start=0&rows=10&indent=on&*fl=product > > id* > > > > And I get the following results: > > > > <doc> > > <int name="productid">1011630553</int> > > </doc> > > <doc> > > <int name="productid">1011630553</int> > > </doc> > > <doc><int name="productid">1011630553</int> > > </doc> > > <doc><int name="productid">1011630553</int> > > </doc> > > <doc><int name="productid">1011630553</int> > > </doc> > > <doc><int name="productid">1011630553</int> > > </doc> > > <doc><int name="productid">1011630553</int> > > </doc> > > <doc><int name="productid">1013033708</int> > > </doc> > > <doc><int name="productid">1013033708</int> > > </doc> > > <doc><int name="productid">1013033708</int> > > </doc> > > > > > > But I don't want those results because there are duplicates. I am looking > > for results like below: > > > > <doc> > > <int name="productid">1011630553</int> > > </doc> > > <doc> > > <int name="productid">1013033708</int> > > </doc> > > > > I know there is deduplication and field collapsing but I am not sure if > > they are applicable in this situation. Thanks for your help!