RE: avoid overwrite in DataImportHandler

Young, Cody Thu, 08 Dec 2011 12:22:33 -0800

I believe all you need to do is add a ?clean=false to your query string.

If you have a unique key setup as your ID in solr then it should update
the existing documents instead of delete and re-indexing.


Cody

-----Original Message-----
From: P Williams [mailto:williams.tricia.l...@gmail.com] 
Sent: Thursday, December 08, 2011 11:11 AM
To: solr-user@lucene.apache.org
Subject: Re: avoid overwrite in DataImportHandler

Ah.  Thanks Erick.

I see now that my question is different from sabman's.

Is there a way to use the DataImportHandler's "full-import" command so
that it does not delete the existing material before it begins?

Thanks,
Tricia

On Thu, Dec 8, 2011 at 6:35 AM, Erick Erickson
<erickerick...@gmail.com>wrote:

> This is all controlled by Solr via the <uniqueKey> field in your
schema.
> Just
> remove that entry.
>
> But then it's all up to you to handle the fact that there will be 
> multiple documents with the same ID all returned as a result of 
> querying. And it won't matter what program adds data, *nothing* will 
> be overwritten, DIH has no part in that decision.
>
> Deduplication is about defining some fields in your record and 
> avoiding adding another document if the contents are "close", where 
> close is a slippery concept. I don't think it's related to your
problem at all.
>
> Best
> Erick
>
> On Wed, Dec 7, 2011 at 3:27 PM, P Williams 
> <williams.tricia.l...@gmail.com> wrote:
> > Hi,
> >
> > I've wondered the same thing myself.  I feel like the "clean" 
> > parameter
> has
> > something to do with it but it doesn't work as I'd expect either.  
> > Thanks in advance to anyone who can answer this question.
> >
> > *clean* : (default 'true'). Tells whether to clean up the index 
> > before
> the
> > indexing is started.
> >
> > Tricia
> >
> > On Wed, Dec 7, 2011 at 12:49 PM, sabman <sab...@gmail.com> wrote:
> >
> >> I have a unique ID defined for the documents I am indexing. I want 
> >> to
> avoid
> >> overwriting the documents that have already been indexed. I am 
> >> using XPathEntityProcessor and TikaEntityProcessor to process the
documents.
> >>
> >> The DataImportHandler does not seem to have the option to set 
> >> overwrite=false. I have read some other forums to use deduplication
> instead
> >> but I don't see how it is related to my problem.
> >>
> >> Any help on this (or explanation on how deduplication would apply 
> >> to my probelm ) would be great. Thanks!
> >>
> >> --
> >> View this message in context:
> >>
> http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandle
> r-tp3568435p3568435.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
>

RE: avoid overwrite in DataImportHandler

Reply via email to