Good to note!

But... any "search" will not detect dupe IDs for uncommitted documents.

-- Jack Krupansky

-----Original Message----- From: Mikhail Khludnev
Sent: Wednesday, July 31, 2013 6:11 AM
To: solr-user
Subject: Re: How might one search for dupe IDs other than faceting on the ID field?

fwiw,

this code won't capture uncommitted duplicates.


On Wed, Jul 31, 2013 at 9:41 AM, Dotan Cohen <dotanco...@gmail.com> wrote:

On Tue, Jul 30, 2013 at 11:14 PM, Jack Krupansky
<j...@basetechnology.com> wrote:
> The Solr SignatureUpdateProcessorFactory is designed to facilitate
dedupe...
> any particular reason you did not use it?
>
> See:
> http://wiki.apache.org/solr/Deduplication
>
> and
>
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>

Actually, the guy who made the changes (a coworker) did in fact write
an alternative UpdateHandler. I've just noticed that there are a bunch
of dupes right now, though.

public class DiscoAPIUpdateHandler extends DirectUpdateHandler2 {

    public DiscoAPIUpdateHandler(SolrCore core) {
        super(core);
    }

    @Override
    public int  addDoc(AddUpdateCommand cmd) throws IOException{

        // if overwrite is set to false we'll use the
DefaultUpdateHandler2 , this is done for debugging to insert
duplicates to solr
        if (!cmd.overwrite) return super.addDoc(cmd);


        // when using ref counted objects you have!! to decrement the
ref count when your done
        RefCounted<SolrIndexSearcher> indexSearcher =
this.core.getNewestSearcher(false);

        // the idea is like this we'll make an internal lucene query
and check if that id already exists

        Term updateTerm = null;


        if (cmd.updateTerm != null){
            updateTerm = cmd.updateTerm;
        } else {
            updateTerm = new Term("id",cmd.getIndexedId());
        }


        Query query = new TermQuery(updateTerm);
        TopDocs docs = indexSearcher.get().search(query,2);

        if (docs.totalHits>0){
            // index searcher is no longer needed
            indexSearcher.decref();
            // don't add the new document
            return 0;
        }

        // index searcher is no longer needed
        indexSearcher.decref();

        // if i'm here then it's a new document
        return super.addDoc(cmd);

    }

}


> And I give a bunch of examples in my book.
>

I anticipate the book with esteem!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com




--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
<mkhlud...@griddynamics.com>

Reply via email to