For me, I'm using the signature field grouping method, as shown from this
website: https://cwiki.apache.org/confluence/display/solr/De-Duplication

You can set the signatureField to be "title", then during the query,
instead of using &group=true&group.field=title, you can use
&group=true&group.field=signature

Regards,
Edwin


On 4 November 2015 at 16:40, Jan Høydahl <jan....@cominvent.com> wrote:

> I second Toke’s recommendation to ensure you have a pure string-version of
> your title.
> For pure de-duplication you could also consider the lighter-weight
> CollapseComponent
>
> Instead of &group=true&group.field=title, use &fq={!collapse
> field=title_string}
>
> See
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> for more
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 3. nov. 2015 kl. 12.37 skrev Toke Eskildsen <t...@statsbiblioteket.dk>:
> >
> > On Tue, 2015-11-03 at 14:53 +0530, vishal raut wrote:
> >> I have indexed various videos in solr which I have in my database. I
> want
> >> to search for those video titles, but there can be duplicate video
> titles
> >> as well (If the video is same but source is different, this will have
> >> separate entry in solr). To remove those duplicate titles while
> searching,
> >> I am using solr group on title.
> >
> > And you get "Too many values for UnInvertedField faceting on field."
> >
> > There is a fairly low (16M per segment or something like that) limit to
> > the amount of unique values that can be uninverted. DocValues has a much
> > higher limit (2 billion I think. At least it works with 600M+ for us).
> >
> > Add your titles to a StrField with docValues, the group on that.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>
>

Reply via email to