You can simplify things a bit by indexing a "batch number" guaranteed to be different between two runs for the same keyField. In fact I'd make sure it was unique amongst all my runs. Simplest is a timestamp (assuming you don't start two batches within a millisecond!). So it looks like this.
get a new timestamp Add it to _every_ doc in my current run. issue delete-by-query like 'q=keyfield:A AND timestamp:[* TO timestamp} commit As Shawn says, you have to very carefully control the commits. And also note that the curly brace at the end is NOT a typo, it excludes the endpoint. Best, Erick On Fri, Mar 27, 2015 at 7:01 AM, Russell Taylor <russell.tay...@interactivedata.com> wrote: > Yes that works and now I have a better understanding of the soft and hard > commits to boot. > > Thanks again Shawn. > > > Russ. > > -----Original Message----- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: 27 March 2015 13:22 > To: solr-user@lucene.apache.org > Subject: Re: Replacing a group of documents (Delete/Insert) without a query > on the index ever showing an empty list (Docs) > > On 3/27/2015 7:07 AM, Russell Taylor wrote: >> Hi Shawn, thanks for the quick reply. >> >> I've looked at both methods and I think that they won't work for a number of >> reasons: >> >> 1) >> uniqueKey: >> I could use the uniqueKey and overwrite the original document but I >> need to remove the documents which are not on my new input list and the >> issue with the uniqueKey method is I don't know what to delete. >> >> Documents on the index: >> "docs": [ >> { >> "id":"1" >> "keyField":"A" >> },{ >> "id":"2" >> "keyField":"A" >> },{ >> "id":"3" >> "keyField":"B" >> } >> ] >> New Documents to go on index >> "docs": [ >> { >> "id":"1" >> "keyField":"A" >> },{ >> "id":"3" >> "keyField":"B" >> } >> ] >> I would never know that id:2 should be deleted. (on some new document lists >> the delete list could be in the millions). >> >> 2) >> openSearcher: >> My openSearcher is set to false and I've also commented out autoSoftCommit >> so I don't get a partial list being returned on a query. >> <!-- >> <autoSoftCommit> >> <maxTime>${solr.autoSoftCommit.maxTime:1000}</maxTime> >> </autoSoftCommit> >> --> >> >> >> So is there another way to keep the original set of documents until the new >> set has been added to the index? > > If you are 100% in control of when commits with openSearcher=true are sent, > which it sounds like you probably are, then you can do anything you want from > the start of indexing until commit time, and the user will never see any of > it, until the commit happens. That allows the following relatively simple > paradigm: > > 1) Delete LOTS of stuff, or perhaps everything in the index with a > deleteByQuery of *:* (for all documents). > > 2) Index everything you need to index. > > 3) Commit. > > Thanks, > Shawn > > > > ******************************************************* > This message (including any files transmitted with it) may contain > confidential and/or proprietary information, is the property of Interactive > Data Corporation and/or its subsidiaries, and is directed only to the > addressee(s). If you are not the designated recipient or have reason to > believe you received this message in error, please delete this message from > your system and notify the sender immediately. An unintended recipient's > disclosure, copying, distribution, or use of this message or any attachments > is prohibited and may be unlawful. > *******************************************************