Good to note!
But... any "search" will not detect dupe IDs for uncommitted documents.
-- Jack Krupansky
-----Original Message-----
From: Mikhail Khludnev
Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID
field?
fwiw,
this code won't capture uncommitted duplicates.
On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen <dotanco...@gmail.com> wrote:
On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
<j...@basetechnology.com> wrote:
> The Solr SignatureUpdateProcessorFactory is designed to facilitate
dedupe...
> any particular reason you did not use it?
>
> See:
> http://wiki.apache.org/solr/Deduplication
>
> and
>
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.
public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {
public DiscoAPIUpdateHandler(SolrCore core) {
super(core);
}
@Override
public int addDoc(AddUpdateCommand cmd) throws IOException{
// if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
if (!cmd.overwrite) return super.addDoc(cmd);
// when using ref counted objects you have!! to decrement the
ref count when your done
RefCounted<SolrIndexSearcher> indexSearcher =
this.core.getNewestSearcher(false);
// the idea is like this we'll make an internal lucene query
and check if that id already exists
Term updateTerm = null;
if (cmd.updateTerm != null){
updateTerm = cmd.updateTerm;
} else {
updateTerm = new Term("id",cmd.getIndexedId());
}
Query query = new TermQuery(updateTerm);
TopDocs docs = indexSearcher.get().search(query,2);
if (docs.totalHits>0){
// index searcher is no longer needed
indexSearcher.decref();
// don't add the new document
return 0;
}
// index searcher is no longer needed
indexSearcher.decref();
// if i'm here then it's a new document
return super.addDoc(cmd);
}
}
> And I give a bunch of examples in my book.
>
I anticipate the book with esteem!
--
Dotan Cohen
http://gibberish.co.il
http://what-is-what.com
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>