This is all controlled by Solr via the <uniqueKey> field in your schema. Just remove that entry.
But then it's all up to you to handle the fact that there will be multiple documents with the same ID all returned as a result of querying. And it won't matter what program adds data, *nothing* will be overwritten, DIH has no part in that decision. Deduplication is about defining some fields in your record and avoiding adding another document if the contents are "close", where close is a slippery concept. I don't think it's related to your problem at all. Best Erick On Wed, Dec 7, 2011 at 3:27 PM, P Williams <williams.tricia.l...@gmail.com> wrote: > Hi, > > I've wondered the same thing myself. I feel like the "clean" parameter has > something to do with it but it doesn't work as I'd expect either. Thanks > in advance to anyone who can answer this question. > > *clean* : (default 'true'). Tells whether to clean up the index before the > indexing is started. > > Tricia > > On Wed, Dec 7, 2011 at 12:49 PM, sabman <sab...@gmail.com> wrote: > >> I have a unique ID defined for the documents I am indexing. I want to avoid >> overwriting the documents that have already been indexed. I am using >> XPathEntityProcessor and TikaEntityProcessor to process the documents. >> >> The DataImportHandler does not seem to have the option to set >> overwrite=false. I have read some other forums to use deduplication instead >> but I don't see how it is related to my problem. >> >> Any help on this (or explanation on how deduplication would apply to my >> probelm ) would be great. Thanks! >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/avoid-overwrite-in-DataImportHandler-tp3568435p3568435.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>