On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky <j...@basetechnology.com> wrote: > The Solr SignatureUpdateProcessorFactory is designed to facilitate dedupe... > any particular reason you did not use it? > > See: > http://wiki.apache.org/solr/Deduplication > > and > > https://cwiki.apache.org/confluence/display/solr/De-Duplication >
Actually, the guy who made the changes (a coworker) did in fact write an alternative UpdateHandler. I've just noticed that there are a bunch of dupes right now, though. public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 { public DiscoAPIUpdateHandler(SolrCore core) { super(core); } @Override public int addDoc(AddUpdateCommand cmd) throws IOException{ // if overwrite is set to false we'll use the DefaultUpdateHandler2 , this is done for debugging to insert duplicates to solr if (!cmd.overwrite) return super.addDoc(cmd); // when using ref counted objects you have!! to decrement the ref count when your done RefCounted<SolrIndexSearcher> indexSearcher = this.core.getNewestSearcher(false); // the idea is like this we'll make an internal lucene query and check if that id already exists Term updateTerm = null; if (cmd.updateTerm != null){ updateTerm = cmd.updateTerm; } else { updateTerm = new Term("id",cmd.getIndexedId()); } Query query = new TermQuery(updateTerm); TopDocs docs = indexSearcher.get().search(query,2); if (docs.totalHits>0){ // index searcher is no longer needed indexSearcher.decref(); // don't add the new document return 0; } // index searcher is no longer needed indexSearcher.decref(); // if i'm here then it's a new document return super.addDoc(cmd); } } > And I give a bunch of examples in my book. > I anticipate the book with esteem! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com