OK - I see that this can be done with Field Collapsing/Grouping. I also
see the mentions in the Wiki for avoiding duplicates using a 16-byte hash.
So, question withdrawn...
On Thu, Aug 22, 2013 at 10:21 PM, Dan Davis wrote:
> Suppose I have two documents with different id, and there is anothe
Suppose I have two documents with different id, and there is another field,
for instance "content-hash" which is something like a 16-byte hash of the
content.
Can Solr be configured to return just one copy, and drop the other if both
are relevant?
If Solr does drop one result, do you get any indi
@lucene.apache.org
主题: removing duplicates
hello,
We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates
k@gmail.com]
Sent: Wednesday, August 21, 2013 2:34 PM
To: solr-user@lucene.apache.org
Subject: Re: removing duplicates
Thanks Aloke and Robert. Can you please give me code/query snippets?
(newbie here)
On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal wrote:
> Hi,
>
> Facet by on
Hi,
This will help you identify the duplicates:
q=*:*&fl=id&facet=true&facet.mincount=2&rows=0&facet.field=
To actually remove them from Solr, you will have to do something like
Robert suggested. Write an application that uses the results to build a
delete by id query (
http://wiki.apache.org/sol
Thanks Aloke and Robert. Can you please give me code/query snippets?
(newbie here)
On Wed, Aug 21, 2013 at 2:31 PM, Aloke Ghoshal wrote:
> Hi,
>
> Facet by one of the duplicate fields (probably by the numeric field that
> you mentioned) and set facet.mincount=2.
>
> Regards,
> Aloke
>
>
> On Th
21, 2013 2:15 PM
To: solr-user@lucene.apache.org
Subject: removing duplicates
hello,
We have documents that are duplicates i.e. the ID is different, but rest of the
fields are same. Is there a query that can remove duplicate, and just leave one
copy of the document on solr? There is one numeric
Hi,
Facet by one of the duplicate fields (probably by the numeric field that
you mentioned) and set facet.mincount=2.
Regards,
Aloke
On Thu, Aug 22, 2013 at 2:44 AM, Ali, Saqib wrote:
> hello,
>
> We have documents that are duplicates i.e. the ID is different, but rest of
> the fields are sam
hello,
We have documents that are duplicates i.e. the ID is different, but rest of
the fields are same. Is there a query that can remove duplicate, and just
leave one copy of the document on solr? There is one numeric field that we
can key off for find duplicates.
Please advise.
Thanks
> I know that I can use the
> SignatureUpdateProcessorFactory to remove duplicates but I
> would like the duplicates in the index but remove them
> conditionally at query time.
>
> Is there any easy way I could accomplish this?
Closest thing can be group documents by signature field.
http://wiki
I know that I can use the SignatureUpdateProcessorFactory to remove
duplicates but I would like the duplicates in the index but remove them
conditionally at query time.
Is there any easy way I could accomplish this?
11 matches
Mail list logo